We are the priests of the Temple of Syrinx
Our great computers filled these Hallowed Halls
How much data does a business need to store? How much does a dot-com need to store? What about SOX? The storage vendors must be pouring money into congressmen’s pockets to make requirements all the more heinous– every email, every photo, every random document and the full diff history of everything so far. EMC must be salivating at where the requirements are headed.
I’ve been trying to work it out on the thumbnail math level. The metric here should be 1) product, and 2) customer. Every product has a text description, photo(s), price history, qty, and a half-dozen other table entries. The photos are the biggest wildcard here– the other stuff is really just some text. The product metric is trackable. I (as CTO) can set it to what I want. I can tell Marketing that they get xx kilobytes per product, or rather they tell me what the want to do, and I can do the math pretty quickly, and tell them that storage will cost X.
Customers are another story. There is seemingly no end to the amount of data to track there. We’re way past phone number and shoe size. Every click, every photo uploaded, downloaded, product added to the cart, not added to the cart, and the timestamps on every freaking event since Goldstein set up the first cameras all over Airstrip One. Beyond that, there’s all the derivative data: take any two or three data points that I have mentioned so far, mash them together, and voila! A whole new data set that needs to be put somewhere, with it’s own little dashboard.
There is an economy of data– and it’s not the cost of physical storage. It’s the cost of eyeballs on all these little numbers running around. It’s the Kuhn-like convincing of truth (not Truth) to your peers that your statistics are the right ones to measure. Who wins in middle management? The person who can convince as many other employees that _their_ view of the numbers is correct.
Punch cards begat tapes; tapes begat disks; disks begat databases; databases begat data warehouses; data warehouses begat “business intelligence”. BI begat dashboards. Everything up until that last step was sheer fecundity. The BI–>dashboard stage is the first step toward a “meta” level synthesis from all that data. We now have too much data to comprehend. But will the trend continue toward less presentation? Will it swing back toward the swamping? Who can trust the synthesised conclusions? Where is the rigour around the methodologies that produce these dashboards? What next? “data consolidation”? “derivative decisionment”? It doesn’t really matter to the storage people: every derivative, every summary, every consolidated report still requires that much more storage. Every time we breed one set of numbers with another, we need to store all the children.
I promise I'm relevant 
[...] and a half years ago, I posited that there would be an ever-growing demand for storage within ecommerce companies: customer data, [...]