The automated analysis of "unstructured" data is becoming remarkably agile at giving companies detailed answers to the age-old business question of "How are we doing?"
The tiniest of flaws in a massive forklift truck is crucial information for Ryan McLawhorn, quality improvement manager at NACCO Industries Inc. If his cargo-vehicle division can detect common problems and fix them in the manufacturing process, it can save millions on warranty claims.
That's not easy with 80,000 claims rolling in every year. So McLawhorn turned to data-mining software that examines service reports for precise trends. For years he had software that could alert him, say, to a batch of wiring problems. But now he can be told if a certain wire often comes loose, and under what circumstances.
"It's really almost unlimited," he said.
The technology can be made to work not only on service records and other internal data, but also on the hue and cry of the vast public Internet, where products and corporate reputations are obsessively discussed in blogs, message boards and e-commerce sites.
Eastman Kodak Co. uses unstructured-data analysis to spot connections in its own and its competitors' patent filings. Government agents use it to hunt for insider trading or linkages between terrorist groups. Mayo Clinic researchers use it to scan physicians' notes for evidence about the efficacy of treatments.
The breakthrough has been in getting computers to understand the content of the documents they scan.
Often by diagramming sentences as a grammar school student would, text-analysis programs can tell the difference between a blog that says a motorcycle is so fast "it smokes" and one that says the bike's engine emits smoke.
Picking up on such details quickly is vital in an age when fountains of data gush every minute.
"Our technology, on a simple laptop, can read through `Moby-Dick' and analyze it in nine seconds," said Craig Norris, head of Attensity Corp., the company that supplied NACCO's software.
In hopes of broadening the potential of this kind of software, several companies planned to announce an agreement Monday on a technological standard that will let multiple computing engines for sorting unstructured data work together.
The programming codes that govern the framework, spearheaded by International Business Machines Corp. in conjunction with academic researchers and the Defense Advanced Research Projects Agency, will be open source and freely available.
The cooperation is required because so many different kinds of unstructured-data engines have sprung up in recent years, driven in large part by the U.S. government's demand for intelligence analysis. The CIA has funded several unstructured-data management companies, including Attensity.
Another CIA-backed company, Intelliseek Inc., recently partnered with the Factiva information service to offer "reputation insight."
Intelliseek scans 4 million Web logs and e-mail list servers, and Factiva - a joint venture between Dow Jones & Co. and Reuters Group PLC - combs news stories, radio transcripts and other media. Together they produce for companies a detailed analysis of how the public thinks about them at any given point.
For example, the most popular phrases relating to a company can be determined, and whether those terms are waxing or waning in significance.
Comparisons with competitors can be generated - as well as to a company's own business results. Who knows? Perhaps a seemingly unrelated bit of geopolitical news tends to boost sales. Or maybe early word can be gleaned about problems with a product that might lead to an expensive recall.
"The world has become more democratic. In the old days the company would issue a message, and the only alternative to that was, people could meet on the street and talk about it," said Randy Clark, marketing director of ClearForest Corp., a data-analysis company whose customers include Kodak and government agencies. "Now those communications are pretty visible."