Patterns for Success – Options for Analyzing Unstructured Information
By Dr. Fern Halper, Partner
Unstructured text data is everywhere in a company. Examples include call-center logs, web pages, customer service survey results, text emails, online discussions, claims forms, sales returns, and any other document that incorporates room for comments or other text fields. Analyzing this text has often been considered the Holy Grail by marketers as well as those concerned with issues such as fraud, product incidents, and customer service. Yet, probably less than 25 % of large companies are actually analyzing this information. In many instances the rest simply don’t have the bandwidth to deal with it. To be truly competitive, however, organizations need to analyze and leverage this text information by deploying text analytics technologies.
Let’s look at an example. A telecommunications company wants to understand retention issues with its wireless customers. At hand, it has structured data consisting of order history, billing records, demographic information, and response to promotions. The company mines this information and sees some interesting patterns, but nothing conclusive. It then takes its call center records and analyzes the text in these files. The result? Customers are not satisfied with the features that the company is offering on its wireless phones. This information didn’t appear in the structured data, but it did appear in the call-center records. The company then merges this text data with its structured data to enhance the analysis even further by examining the demographics of this group.
So, what exactly is text analytics? Text analytics is the process of extracting unstructured text and transforming it into structured information that can then be mined and analyzed in various ways. This transformed information can be combined with additional structured data a company owns (e.g. sales, demographic data) and analyzed using various predictive and automated discovery techniques. Or, the text can be extracted and transformed and then analyzed interactively to determine relationships and trends, look for clusters and so on. The actual extraction of the information is accomplished via techniques from the fields of computational linguistics, statistics, and other computer science disciplines. For example, computational linguistic algorithms can enable the parsing of sentences to extract the who, what where, when and why in text. Text analytics differs from search, although it can be used to augment search. In basic search technologies, end users know what they are looking for. Interestingly, search is now evolving and converging with business intelligence to provide applications that might, for example, monitor news feeds to understand what competitors are doing.
While the field is still evolving, there are a number of players out there worth noting.
- Business intelligence powerhouses SPSS and SAS both offer solutions in this space tied to their data mining and predictive analysis products. SPSS Predictive Text Analytics solution combines the linguistic technologies of their LexiQuest text mining products with the data mining capabilities of Clementine. SAS Text Miner is integrated with its Enterprise Miner product and provides users with the ability to mine structured and unstructured information. SAS also has technologies to deal with finding relationships between documents.
- Other companies such as Attensity, Inxight, Clear Forest and nStein provide information extraction technologies that can be leveraged in various analytical activities. For example, Attensity offers a number of different extraction techniques together with a series of its own applications that allow users to interactively explore information found in text and also analyze it. Attensity also works with other third party software. Inxight provides text extraction software that can be used with its visualization technologies to determine relationships and trends in text data. It also has applications to augment the capabilities of search engines.
- Companies such as Clarabridge Inc. deal with the preprocessing of text data in order to make it more useful in business intelligence packages. The product, Clarabridge Content Mining Platform, provides connectors to source information, transforms the information using various extraction techniques, then performs data quality and staging work on the data, and provides a schema that can serve the information up to various BI packages.
- Even the big players like IBM, Oracle, and Microsoft are making moves to offer solutions in the text analytics space. IBM has developed the Unstructured Information Management Architecture (UIMA), an open-source framework that defines a common set of interfaces for integrating different text analytic components and applications.
Hurwitz & Associates believes that this rapidly evolving space is the future of data analysis. We are undertaking a major research study in the area of text mining and analytics. The study, entitled “Text Analytics – the Road to Understanding your Company’s Unstructured “Information,” examines topics including the text analytics market, the technology behind text analytics, vendor offerings, and the value of this technology. The report is targeted for publication in early Q4’06. We invite any vendor in this market to participate in the study. Additionally, we are looking for input from end-users. If you are an end-user currently using text mining technologies or considering using them, we invite you to participate, as well.
Please contact Fern Halper at firstname.lastname@example.org for further information