How Google treats duplicate content from scrapers

by Jason Preston on June 10, 2008

I know I’ve said it before, but if you’re running your web site and you’re not paying attention to the Official Google Webmaster Central Blog, you’re ignoring a very good resource.

Just yesterday, Sven Naumann (who is on the search quality team) wrote a post dealing with concerns webmasters have about scraper sites that pull exact post content and republish it as their own.

So what does Google do? They use a couple of methods, which they don’t identify, to determine which bit of duplicate content is the original piece, and then point to those.

For people who see scraping sites placing higher in search results than their own, original content, they offer this advice:

Some webmasters have asked what could cause scraped content to rank higher than the original source. That should be a rare case, but if you do find yourself in this situation:

  • Check if your content is still accessible to our crawlers. You might unintentionally have blocked access to parts of your content in your robots.txt file.
  • You can look in your Sitemap file to see if you made changes for the particular content which has been scraped.
  • Check if your site is in line with our webmaster guidelines.

Largely, the answer seems to be “don’t worry, we’ve got a handle on it.” But if you want more details on how to minimize duplicate content within your own domain, you can go check out the post.

What I think Google really needs to solve though, is the arrival of “old” content for the first time online. How do you figure out who the real owner of that content is?

{ 3 comments… read them below or add one }

1 Georgia Jenkins 06.20.08 at 10:33 am

This woud definitely present a problem. I’m wondering if the “scrapers” or plagiarists might have added material to their site without giving credit.
Perhaps one could improve clarity of their site, or add graphics to enhance the original???

Georgia Jenkins

2 Jen 06.24.08 at 9:48 pm

I worry about copyrighting to, everyone has the right to get whats there, maybe the answer is using the copyright software everyone is talking about, if you don’t know try it- Glyphius

3 nickle young 08.17.08 at 1:17 am

Yes!I realy worry about copyrighting!This woud definitely present a problem.How do you figure out who the real owner of that content is?

Sponsored links

advertise here