While many startups managed to cash-in on the Big Data hype by raising large amounts of money to make the Hadoop framework easier to deploy and manage, we have not seen many truly innovative solutions that deliver the raw power of Hadoop in readily available solutions.
The opportunities of these applications, in my mind, fall into three groups:
This is data that is already collected but not harnessed to deliver any business value. One example of data that is currently unused is the area of DevOps. The two main products in this field that I am familiar with are Chef and Puppet. With the recent popularity of these tools, and the increasing number of devices (nodes) they can manage, these products are in a very unique position of knowing a lot about the environments in which they reside. Here is the kicker though, in today's version of these products, much of that visibility is lost, discarded as soon as it goes out of scope. Can you imagine the opportunities to leverage this data to drive new value through pattern recognition or usage modeling/forecasting?
Drive New Value From Summary Data
This one is close to my heart, as I spent close to nine years in developing environmental, health and transportation data management systems. Often in this area, large data sets are extensively summarized before any meaningful analysis at the county or state level. By the time this data gets available on the national level, we already lost much of its meaningful insight. The opportunity here is to not only leverage the detail, but, augment it with related data-sets to provide fuller view.
Harness Heterogeneous Data Sensors
Whether we like it or not, we, and many things around us, are "nodes" in elaborate sensor networks collecting Big Data. I know this sounds scary, so, let me explain. The devices we carry, the vending machines we frequent, the thermostat monitoring the temperature in our homes, all these have sensors; and increasingly numbers of these devices collect actionable data. Traditionally, we think of these data sources as silos: "If vending machine indicates it ran out of soda, refill it.” The opportunity in this area is to develop the capability to augment already existing points of potential data collections, mobile sensors if you wish. High velocity sensor devices that are capable of acquiring multi-media input (visual, air & temp sample, motion indicator, etc.) and relaying this data to predefined aggregation points. These devices will deliver new data points from existent infrastructure allowing for new pattern recognition and the development of new delivery models.