Plugging Up Those Duplicate Content Holes
Is your site suffering from unfortunate duplicate content? Do you wake up at night in fear that your perfectly SEO’d pages will be filtered out of the SERPs? Have you suffered any of the following symptoms: Link dilution, ranking fragmentation, important page disappearing, rank checking-induced vomiting? If so, we can help!
Here are some of the common causes of duplicate content and how to best avoid them.
Accept it; most people are jerks. That means many folks out there will have no problem stealing your content and slapping it on their own site to place ads on it. It’s up to you to protect your valuable content from these villains. The first step in protecting your site is to mark your territory. That includes using your brand name frequently within the content, sticking to absolute links, and hosting your images locally. People will likely steal your content anyway, but this will make it harder for them to do so and easier for you to spot.
You’ll also want to be vigilant about protecting your content. There are plenty of ways to go about this. Copyscape is a good tool to use to go hunting or stolen passages or you can cut and paste a snippet of unique text and use it as a Google Alert. If you find someone has stolen your content and that it has caused Google to trigger a filter, either approach them about taking it down or consider changing your content so that it is no longer duplicated. It sounds like a pain, but it’s often the easier recourse and is much better than having your pages filtered out of the index!
Any time the search engines find the same page at multiple URLs, you have a duplicate content situation. For example, if you type in the three URLs below and they all bring up your home page, you have a problem.
Your customers may be able to figure out what’s going on, but the search engines will see three versions of the same page and pick for themselves which one gets to live in the index. You don’t want Google or Yahoo making these very important site decisions for you. Figure out how you want people linking to you and stick to it. 301 the other versions of the page to the lucky URL you decided on.
Sites can also get themselves into trouble when they begin using ugly parameters to track their customers’ movements through their site. Not only does this present a duplicate content issue because the engines can access the same page through multiple URLs, but it’s also not particularly user-friendly. You also run the risk of skewing your analytics data if the parameter-filled URL gets indexed and users start using it to click through from the SERPs.
If you’re going to put parameters in your URLs for tracking purposes, you have a few options. You can block that URL from being spidered by doing a mod_ rewrite or simply redirect it to the URL without the tracking parameter. If you do opt for the latter, make sure it doesn’t mess up your ability to track. Sometimes things get buggy.
Multiple Site Issues/ Mirror Sites
Bruce Clay has offices in the United States, South Africa, the UK and beyond. To make sure the search engines recognize that these are different sites even though the content may be fairly similar, we’re careful to create content specific to each country, as well as take care of all the technical aspects as well – like using country-specific TLDs, hosting the site in the country it’s targeting, specifying that in our Webmaster Tools, etc. Matt Cutts has noted that site owners need not worry too much about duplicate content when it comes to different top level domains. Google is able to tell which site should rank and then filter the dupes.
Along the same lines, you want to be careful of mirror sites that simply republish the same content on multiple domains. For example, some sites have multiple domains like http://www.mydomain.com, http://www.my-domain.com, http://www.mymisspelleddomain.com that are all mirroring the same content. The solution would be to 301 redirect the duplicate domains to the main domain. This not only helps eliminate duplicate content issues, it also makes sure you’re not wasting any link popularity.
Product pages are a goldmine for duplicate content because they’ve very often been built using a single template. This means they’ll typically share the same basic description with just a few words altered to tell the customer that the shoe they’re looking at also comes in red, black brown and blue, as well as in suede and patent leather. Your customers may love you for all of your available options, but the search engines are likely to get confused as to why you have virtually identical content on several pages of your site. Not that they’ll penalize you for it. They’ll simply “help” you by “filtering” all the extras.
You really have three options when it comes to this one.
- Take on a massive project and write unique content for all of your product pages
- Update your robots.txt so that only one product description (preferably the one that provides the most revenue) is crawled
- Consolidate your product pages and use another method to show all the different styles and options. Perhaps using CSS or some other fancy hover-type creation.
Block Printer Pagers
It’s good that provide printer-friendly pages for readers and customers looking to take your content with them, but there’s absolutely no reason why the spiders should have to know about it. These pages may be super usable for your audience but they provide no links back to your site for the engines to follow and they’re just going to diminish the apparent uniqueness of your content. Put these pages in their own folder and then disallow it in your robots.txt. It’s that simple.
What other forms of duplicate content do you commonly come across?