By Dr. Fern Halper and
On November 15, 2006, IBM announced that they have turned over the standardization of the Unstructured Information Management Architecture (UIMA) to the Organization for the Advancement of Structured Information Standards (OASIS). Additionally, the Apache Software Foundation has established an incubator project for developing UIMA-based software. What is the significance of this announcement?
First of all, it is important to understand what UIMA actually is. UIMA is an open framework and Software Developer’s Kit for developing applications that utilize unstructured information – the type of information that can be extracted from call center logs, blogs, customer surveys and so on. The UIMA standard enables developers to build UIMA-compliant components, called annotators that contain the logic to analyze this unstructured data. For example, an annotator can be the UIMA-compliant logic needed to extract a person’s name from a text document. Another annotator might contain UIMA-compliant logic to extract a phone number. These annotators can be as granular as the designer wants them to be. There can be entity annotators (for examle, the name and phone number above), relationship annotators, concept annotators, and so on. Combinations of these semantic analyses as well as search annotators can be utilized to build text analysis solutions. Annotators can be reused in different applications.
Consider the following example. An automobile insurance company wants to understand tire-related claim issues. Using annotators built using the UIMA standard, the company can piece together a solution that will enable them not just to search on words like tires and problems and utilize claim number information, but to additionally add to the capability, via the annotators that utilize semantic analysis to determine the conditions and actions associated with these claims. The solution can be further enhanced by integrating it into business intelligence software to visualize and analyze the information. The key point is that these annotators may have been originally built by different groups but are now re-used by the insurance company to build their particular solution.
So, why is this announcement significant? While the value of text analytics is high, adoption has been relatively slow. By providing a common set of interfaces for vendors to plug their text analytics components into, UIMA could help enable text analytics to be integrated into a broader set of solutions, and ultimately help increase the adoption of the technology. This standard lowers the barrier to entry and enables developers to focus on value-added domain-specific analytics. A standard also provides credibility for the space, which will help smaller companies in this space.
A number of text analytics vendors including Attensity, ClearForest, SAS Institute, SPSS, Temis, and nStein have already endorsed the framework. Business Intelligence vendors such as Cognos have also come on board. Carnegie Mellon University’s Language Technology Institute is hosting a UIMA Component Repository web site. All of this is an important step in the right direction for text analytics adoption.