Mining Online Sentiment: Can Algorithms Alone Really Tag Blog Posts Accurately?
I’ve been spending a lot of time over the past few months researching companies in the “sentiment analysis” space. When we began developing our own process for categorizing/tagging blog posts with product and/or company affinity, we discovered that most monitoring systems take one of two approaches. They either take an algorithmic approach to text mining, or use a human tagging methodology.
Bottom line — have a computer “read” the text, or have humans do it.
I’m hearing conflicting reports about the pure algorithmic approach and its accuracy. Academic research largely attests that you can’t get much better than 80% accuracy when analyzing “unstructured” content. Others claim that the right algorithms can practically tell you a bloggers shoe size.
Our foray into this space started when a founder of one of the more prominent (and well funded) brand monitoring companies confided to me that their year-long initiative pursuing algorithmic sentiment detection was considered a failure due to achieving at best 80 percent accuracy.
Technical gurus at another well-funded and well known firm in this space confirmed in discussion the 80 percent figure for their algorithmic process.
Given their experiences, I wonder if most of these claims of highly accurate sentiment tagging using only algorithms is just PR spin.
Seth Grimes recently wrote an article on the subject that implies 80 percent is high on the scale:
“Text analytics/content management vendor Nstein reports that their Nsentiment annotator, ‘when trained with appropriate corpus, can achieve a precision and recall score between 60% to 70%.” These are good numbers when it comes to attitudinal information. Michelle DeHaaff, marketing VP at Attensity, says that “getting beyond sentiment to actionable information, to ’cause,’ is what our customers want. But first, you’ve got to get sentiment right.’”
We have developed a hybrid platform that provides human-level accuracy with the benefits of an automated environment. We’re doing exhaustive testing now, but we’re seeing accuracy way beyond 80 percent. Check it out here.
One company touting the algorithmic approach is SPSS. They work closely with Anderson Analytics who provides services in this space. It appears surveys are one of the main content sources they process — which seems like rather “structured” content to me. No doubt that boosts accuracy. Tom Anderson’s blog is here, and he discusses an upcoming webinar on the subject.