From the category archives:

sentiment analysis

Andy Beal Interview: More Detail About Blog Monitoring with Trackur

by Steve Broback on April 29, 2008

I had a chance to interview Web marketing guru Andy Beal recently about Trackur, his new blog/press monitoring service. Since we had recently unveiled our own sentiment tracking system, I was intrigued by what appears to be a complimentary offering.

I tried to get Andy to reveal a little of what goes on behind the technological curtain, but understandably he was a bit reserved about detailing trade secrets.

Note the final paragraph. This is where Andy really aligned with our thinking regarding sentiment tracking. As Alan Wilensky says, sentiment is the weakest of CGM metrics.

I’d be eager to hear from any clients how the service is helping them in their brand monitoring efforts. The current buzz has been quite positive.

Here is the interview in it’s entirety:

Steve Broback: Many companies have “rolled their own” monitoring systems by aggregating custom search feeds from multiple sources. Other than the time it takes to create these searches, the challenge is that duplicate content, old content, and spam (splogs!) need to continually be weeded out. Is Trackur intended to be the alternative to this largely manual process?

Andy Beal: Absolutely! We built Trackur because creating custom search feeds was too time consuming and we couldn’t get the filtering and reporting options we needed. With Trackur, you enter your keyword one time and then let Trackur automatically monitor the different types of social media for you. You can filter out unwanted items, sort the results, email to co-workers and subscribe via RSS or email updates. You can’t do any of that when you manually monitor your reputation.

Steve Broback: Are we correct in assuming that Trackur taps into multiple existing search engines and then de-dupes and de-spams the results?

Andy Beal: Trackur does a great job of filtering out the noise and focusing on the signal–the content that matters most to your reputation. It doesn’t remove all duplication and actually, you probably don’t want to remove it all. A post might show up in Technorati one week, then again on Digg.com the next–if you removed the duplicates, you’d miss this reoccurrence.

Steve Broback: What search engines are you leveraging?

Andy Beal: Trackur pulls from a wide selection of content. It’s not really a search engine, more of a reputation aggregator. We don’t provide a complete list of sources, but we do include some unique content such as Flickr, YouTube, and Digg.

Steve Broback: Have you created your own crawler of any kind, or is it exclusively tapping into existing indexing services?

Andy Beal: We didn’t set out to make Trackur a web crawler. It’s a reputation monitoring and aggregation tool. It’s power comes from bringing a wide range of web content together in a central database, then giving you powerful tools to manage the data.

Steve Broback: Google has been working this problem for years without much luck — how good is Trackur at removing splogs from results?

Andy Beal: Removing splogs from search engine results is extremely tough, so we’ll leave that to Google’s immense resources. Instead, Trackur focuses on providing clients with the tools they need to pinpoint conversations which include their reputation. If a Trackur client finds a splog showing up, they can add a filter to remove it from any future results.

Steve Broback: How do you avoid filtering out relevant content?

Andy Beal: We advise Trackur users to start off with the broadest of searches. For example, if you are Apple, start by monitoring “Apple” and see what’s tracked. If you find too many irrelevant results–or simply want to be more refined with your monitoring–you can add filters to focus on a particular word (such as “iPhone”) or remove the unnecessary results.

Steve Broback: How do you insure that old posts don’t re-emerge in search results?

Andy Beal: Actually, we don’t believe it’s a smart practice to say, “never show me this result again” when it comes to reputation monitoring. If a blog post attacks your reputation, you need to know if it keeps resurfacing–that would suggest that the post is being revisited or discussed by others.

Steve Broback: Have you applied for any patents specific to Trackur?

Andy Beal: Not at this time. There are processes we could patent, but we’re not finished enhancing Trackur’s technology, so we’ll probably wait until we’ve added new features, before applying for a patent.

Steve Broback: Shane Atchison says sentiment is the “next great analytics frontier”, and we’ve been focused on that metric of late. Are there any plans to integrate sentiment tagging into Trackur results?

Andy Beal: Sentiment analysis is definitely something we exploring with Trackur. The biggest problem is that it’s virtually impossible to accurately ascertain the sentiment of web content using an algorithm. Apart from the need for human interpretation as to what is positive or negative, technology gets confused by statements such as “Apple Mac’s are wicked bad!”

{ 2 comments }

More Proof That Blog Sentiment Mining is Big Business

by Steve Broback on April 22, 2008

Buzz Bruggeman sent me this info a few days ago. Collective Intellect has closed another round of financing this time worth $6.6M. Their total take so far is $11.2M. The bulk of the services they provide are social media tracking and sentiment analysis.

An interesting note from this article is that it appears their initial foray into sentiment analysis was to provide investor-related analysis services to Wall Street. That idea seems to have been eclipsed by the idea of doing brand monitoring.

Despite the fact that this arena is viewed as a highly attractive one to investors, we have purposely eschewed the notion of pursuing VC funding for Sentimine. It seems to us that the pressure to monetize quickly/prematurely and the risk of commoditization of sentiment puts those with a high level of capitalization in a less competitive position.

I often joke that we need to do a press release touting how we’ve secured $147.50 in our third round of financing for our service.

{ 0 comments }

Jake McKee on Sentiment: Confirms What Shane Atchison Predicted Over a Year Ago

by Steve Broback on April 21, 2008

Monitoring blogger sentiment is critical to journalists according to a report cited by JakeMcKee today. Seems like Sentimine, our new platform for aggregating and tracking blogger sentiment may have a role beyond brand monitoring. It might also serve as a useful tool to serve journalistic endeavors.

I’ve been reading Actionable Web Analytics: Using Data to Make Smarter Business Decisions by Shane Atchison and Jason Burby. Shane (co-founder of ZAAZ) wrote a post for ClickZ back in March of 2007 claiming that sentiment is the “next great analytics frontier.” Seems to me that if companies and now journalists are tracking blogger sentiment, we may be onto something…

{ 0 comments }

How Monitoring the Blogosphere Buzz Could Make You Money

by Steve Broback on April 4, 2008

I’ve put in many hours this past week surfing for posts, articles, and papers covering the sentiment analysis space. We’re preparing to give several presentations focusing on Sentimine, our sentiment analysis service, so I’m assimilating the latest info.

One of the more interesting pages just landed in my browser.

A paper by Veljko Fotak, a doctoral student at the University of Oklahoma’s Price College of Business, shows a correlation between blog stock recommendations and equity prices. This implies that closely following financial bloggers who are bullish (or bearish) on specific equities may give investors an edge.

We are currently steering Sentimine toward brand monitoring uses at this point, but the financial applications may be a logical move down the road.

{ 0 comments }

Automated Sentiment Detection Round 2: 80% Accuracy Confirmed for Blogs and Unstructured Content

by Steve Broback on March 15, 2008

I have more data points relevant to yesterday’s post. Bottom line: Yes, you need non-trivial human involvement to go beyond 80 percent accuracy with unstructured content like blogs. Text-mining vendors claim that for many projects 80 percent is perfectly adequate though. Based on what I’m reading, I think there is likely a market for a process like ours that can automate the tagging and extraction/compilation of relevant content at high (90 percent plus) accuracy levels.

After drafting yesterday’s post about mining blog sentiment I discovered a Feb 27 post on the SentimentMetrics blog which reinforced what I’d heard from other gurus in the space. The SentimentMetrics blogger, (Leon? — posts don’t list the name of the author) says:

“SentimentMetrics uses an automated approach and we are currently at an 80% accuracy which is considered good in the industry…”

In addition, Mark Anderson responded to my post yesterday with a comment on his own blog. Anderson clarified:

“If you are working with longitudinal data, comparing month to month for instance, or comparing different products and brands then extremely accurate sentiment reading isn’t necessary as you are really looking for differences between groups. Additionally by considering the relationship between positive and negative sentiment in trended data (they tend to be positively correlated) when the correlation changes, in other words in one month for one brand you might see that negative sentiment increases while positive decreases, this signals a possible ‘event’ is occurring which needs to be drilled down into for further investigation.

However, for some of our clients in the past (such as Unilever), an extremely accurate level of sentiment was desired. Our methodology (AA-TextSM) relies on triangulation for validation, and we have sentiment accuracy in high nineties in most cases when applying this technique. Because most of our projects are ad-hoc in nature, the human factor is very important, so Anderson Analytics, more so than those companies focusing solely on a large volume of blog posts usually invest the time in perfecting custom dictionaries and understanding the special relationships between words in each project.

As you mention, many survey open ends are rather structured. On the other hand many are not. For instance if you ask a hotel guest to rate their overall satisfaction on a 10 point scale, then ask, why did you give this rating in an open ended question, you will get anything but structured answers. Our methodology has been used on other types of data as well though (call center logs, emails etc.).”

It sounds like the AA-TextSM system requires human involvement to customize the algorithmic process. In that last paragraph, Anderson attests that surveys can contain unstructured data. It seems to me that without getting humans involved (like to create custom dictionaries) you fade back to 80 percent accuracy when analyzing those unstructured portions.

{ 2 comments }

Mining Online Sentiment: Can Algorithms Alone Really Tag Blog Posts Accurately?

by Steve Broback on March 14, 2008

I’ve been spending a lot of time over the past few months researching companies in the “sentiment analysis” space. When we began developing our own process for categorizing/tagging blog posts with product and/or company affinity, we discovered that most monitoring systems take one of two approaches. They either take an algorithmic approach to text mining, or use a human tagging methodology.

Bottom line — have a computer “read” the text, or have humans do it.

I’m hearing conflicting reports about the pure algorithmic approach and its accuracy. Academic research largely attests that you can’t get much better than 80% accuracy when analyzing “unstructured” content. Others claim that the right algorithms can practically tell you a bloggers shoe size.

Our foray into this space started when a founder of one of the more prominent (and well funded) brand monitoring companies confided to me that their year-long initiative pursuing algorithmic sentiment detection was considered a failure due to achieving at best 80 percent accuracy.

Technical gurus at another well-funded and well known firm in this space confirmed in discussion the 80 percent figure for their algorithmic process.

Given their experiences, I wonder if most of these claims of highly accurate sentiment tagging using only algorithms is just PR spin.

Seth Grimes recently wrote an article on the subject that implies 80 percent is high on the scale:

“Text analytics/content management vendor Nstein reports that their Nsentiment annotator, ‘when trained with appropriate corpus, can achieve a precision and recall score between 60% to 70%.” These are good numbers when it comes to attitudinal information. Michelle DeHaaff, marketing VP at Attensity, says that “getting beyond sentiment to actionable information, to ’cause,’ is what our customers want. But first, you’ve got to get sentiment right.’”

We have developed a hybrid platform that provides human-level accuracy with the benefits of an automated environment. We’re doing exhaustive testing now, but we’re seeing accuracy way beyond 80 percent. Check it out here.

One company touting the algorithmic approach is SPSS. They work closely with Anderson Analytics who provides services in this space. It appears surveys are one of the main content sources they process — which seems like rather “structured” content to me. No doubt that boosts accuracy. Tom Anderson’s blog is here, and he discusses an upcoming webinar on the subject.

Relevant contributions on this subject can be found from bloggers Matthew Hurst, Stephen E. Arnold, Nathan Gilliatt, and Seth Grimes.

{ 2 comments }

Sponsored links

advertise here