It is inevitable that data would emerge as the most complex and important topics of the next decade. The expansion of the amount and types of data that we have been accumulating across more systems—physical and virtual, applications, and electronic devices – is astounding. These environments generate huge amounts of data – from structured to unstructured – and with increasing use of digital images, social media, and data streams this explosion of data is continuing beyond what we could have imagined. Many organizations have created workarounds to manage these large volumes of complex data using warehouses, data marts, and moving subsets of data into an analytic tool. But in my view we have reached a tipping point where these approaches aren’t enough as much of these data remain in their existing silos.
Making matters worse, there is no way to leverage this data based on the context of the business problem being addressed. Enterprises need a big data strategy that enables query and analysis across different sources and types of data. The key to success in the future will require that all of this data be managed in new ways.
Some of the initial big data use cases have evolved from the problem that occurs when applications generate huge volumes of data – like search engine data, information generated from gene screening, or when a company is trying to analyze customer buying patterns incorporating both traditional structured customer data with unstructured customer call center notes. Many technical early adopters are assuming that big data is synonymous with Hadoop (a way to break data into small fragments of work in a way that they can be executed and analyzed efficiently across a highly distributed hardware environment. Hadoop (managed by the Apache standards organization) has been closely associated with the difficulty of handling large volumes of data generated from web environments. While Hadoop is extremely important, it is still immature and will require time to evolve.
Making things more complicated is that many vendors in the market are slapping the big data moniker on whatever technology they happen to be offering at the moment. But capturing the value of big data is one of the most critical challenges for companies and it needs a thoughtful and innovative approach. So, if big data is much bigger than web data management, what is it? In brief, big data is the ability to manage the huge amounts of data in a way allows customers to gain business value no matter how the volume of data, the form of that data, or the status of that data. It is a big issue that will take many years to address. It is the ability to manage massive amounts of structured and unstructured data at a petabyte scale.
A few weeks ago I attended IBM’s big data summit. During this event IBM made it clear that it is viewing big data in the context with its information management strategy. Like many vendors in the market, IBM is wasting no time in putting a stake in the ground around Big Data. IBM is viewing big data as a way to holistically bring together all of the elements of corporate data. There are three dimensions to the way IBM is planning its approach to data: variety (different types of data), velocity (the speed required to manage the data), and volume (the amount of data in the mix). While the IBM big data strategy is evolving, there were five key take-aways from the meeting:
- Gaining insight from unstructured data requires powerful analytic engines to process and analyze in real-time
- Big data requires Internet scale; it is not for the timid.
- For customer to get a handle on massive amounts of data requires levels of abstraction with the right user interface based on the type of user (developer, business executive, etc.)
- What do you need to know from your data? It depends on what business you are in and how you can leverage both your structured and unstructured data. This requires sophisticated management of both structured combined with unstructured data based on well-defined master data management. Even more fundamental is the requirement to have a way to query that data for both employees and the entire partner community.
- The way you analyze this data is the key. You need to understand the context of the information so you are looking at the right elements in the right way. Separating the information that is noise from the information that is insight is imperative. A misreading of the results can send a business down the wrong path with potentially catastrophic results.
IBM is making a significant investment in analytics and applying this to big data. As IBM executives who spoke at this meeting were quick to admit, it is early in the evolution of big data. IBM is trying to execute an ambitious strategy of bringing together hardware, all varieties of information management, with service management software, and middleware combined with industry frameworks and best practices. The vision will take time to mature but I believe it is based on the right customer pain.