Duplicate Content and Multiple Site Issues — SES San Francisco


Adam Audette, President, AudetteMedia, Inc.


Another session for the morning.  Looks like a great panel with Shari, Michael and Kathleen.  I know this is a topic many of my clients deal with so I have no doubt it’ll be useful information for everyone.

duplicate content panel

First up is Shari. She’s going to explain duplicate content, why it poses a problem, plus how search engines determine duplicate content, and finally, an overview of recommendations and solutions.

Duplicate content doesn’t mean an exact match, instead they are looking for resemblance.  When duplicate content is managed well, a site will start to receive more traffic.  If a site has duplicate content and removes or manages it, they will start to see more visitors, searches and possibly conversions on their site over the next few months.  If your site delivers duplicate content, this means there are going to be pages of your site filtered out of the search engine.  This may mean that your best converting page may not be available to rank because it’ll be filtered out due to a duplicate version of it ranking instead.  An example is a site where Google autofilled the site search engine causing those results pages to rank in the index. The site’s money pages didn’t rank, causing huge issues. When they modified their robots.txt file to exclude the search results pages from the site search, soon the money pages began to rank, solving the problem.

Quick duplicate content checklist:

  • Boilerplate templates
  • Linkage properties (inbound and outbound links)
  • Host name resolution
  • Shingles (word sets)

So how do you deal with duplicate content?

  1. Build a good site in the first place, with good information architecture, site navigation and page interlinking.
    • Are URLs linked to consistently throughout the site or the site network?
    • Are the links labeled consistently?
  2. Robots.txt
    • You’re able to prevent your duplicate pages from even being crawled
  3. Robots Meta tag
    • If articles are shared across your network of sites, are you implementing NOINDEX, NOFOLLOW appropriately?
  4. Canonical tag
    • <link rel=”canonical” href=”http://www.example.com”>
  5. Redirects (301)
  6. NOFOLLOW attribute
  7. Web search engine webmaster tools
  8. Sitemap (XML)

Consistency is important.  Don’t say one thing then do another with the coding of your site. Provide the search engines with clues as to which pages you want to appear in search results, then they are most likely to find the right pages.

With that, Shari ends early because her voice is giving out after her last solo session.  Moving onto Kathleen who works at Pogo [one of my favorite time-waster sites].

She starts with a yummy looking slide of gourmet cupcakes [evil before lunch. Good thing Susan isn’t in here… she might devour the screen].  She says that the image depicts that sometimes duplicate content can be a good thing, sometimes not so much.

There are two types of duplicate content that Kathleen is going to cover: naughty and nice.  There are sometimes legitimate reasons why a site might have duplicate content.  For example, www.site.com, site.com and site.com/index.html.  Another type would be printer-friendly pages, blog category and tag pages. Syndicated content can also create duplicate content.  Kathleen says these are all legitimate reasons for having the duplicate content and you aren’t meaning to do evil with it.

The naughty side of it is when you multiply your content by putting it across different domains.  Or how about stealing another well-ranked site’s content? Those qualify as naughty. Naughty duplicate content is when you knowingly or unknowingly have content out there in order to inflate the amount of content you have in order to rank better.

So what are the consequences? Well, you are no longer going to be completely black listed.  You’ll just start to see that your site gets less visibility in the search engines for certain words. Or you may see that one particular page that you don’t necessarily feel is a good landing page is getting more traffic than the real, user-friendly version of the page. This can affect your conversion rate.

Next are some example of issues Pogo overcame with duplicate content.

They had multiple URLs that allowed a user reach the same page.  This was caused by appended tracking codes on the URLs which made the search engines think that they were all unique URLs when really it was the same URL with just a different user ID or tracking code attached to it.  To overcome that, they put in code to find out who was visiting the site. Was it a user or a search engine. If it was an engine, they got the straight URLs whereas a human got the URLs with modified codes on them.

Another version was millions of look-alike profile pages. The problem was that none of their users were changing the content on the profile pages… leaving them with millions of crawlable, duplicate pages. These were important pages to Pogo because they had some keyword value. To resolve it, they disallowed the engines from these pages altogether.

The last issue was look-alike game download pages. They had download pages for games that all were generic and duplicated. Their solution was to write unique content about each game for the games download page.

Kathleen’s 5 best practices:

  • Determine if you have it.
    • Is the same info located in multiple places on your site?
    • Is every page valuable?
    • Naughty or nice?
  • Leverage resources.
    • Talk to other departments in your company.
    • Consult with your agency.
    • Research industry sites.
    • Review webmaster forums [but be careful there… take some of that info with a grain of salt].
    • Talk to industry peers.
  • Be proactive.
    • Write unique page content.
    • Identify authority pages.
    • Be aware of engine updates.
    • Manage syndicated content.
  • Manage syndicated content effectively.
    • Allow ample time for your original content to be indexed before giving it to other sites.
    • Require links back.
    • Require condensed versions.
    • Use generic Meta data.
  • Don’t freak out.
    • There is no specific penalty.
    • There are legitimate reasons.
    • Duplicate content may not be naughty.
    • There are usually multiple solutions.

Last up is Michael, aka Graywolf, and like the moderator says, he really needs no introduction.

Michael says that in most cases you want to avoid duplicate content, but sometimes it can be used as a weapon. When you syndicate the content, most of the time it’ll be taken as a whole and not modified at all. Use this as an opportunity to get links. Syndicate to trusted partners.  If your content gets published on a trusted source with more authority that your site, it’s okay to let them take “ownership” of that content just to get the value of the link back to your site.

With some of your duplicated content (that’s published on other sites) you can modify your Meta information, which will allow you to rank differently for the same content.

Sometimes duplicate content can be used to get visibility. Michael points to Jason Calacanis of Mahalo who writes articles that cause reactions. In one case he wrote an article and waited for people to reprint it just to get links from large news and authority sites back to Mahalo.

Why Michael loves scrapers:

  • Most Web scrapers search for keywords and leave links in the content.
  • Use this to your advantage by linking to yourself with high value focused keyword anchor text.
  • As long as you offset these with low value links with some mid- and high-level links, it will work to your advantage.
  • Always insert links back to the original website.
  • Change the anchor text, link and surrounding text every 3-4 months to get different credit.


  • Look for opportunities to syndicate your duplicate content to gain attention, exposure and links from trusted sites.
  • Refine your copy to target more keywords.
  • Be on the lookout for people who may be re-using your content who aren’t helping you.
  • Allow your blog and RSS feed to be syndicated with self-referencing and keyword-focused links to commercial pages.
Comments (1)
Filed under: SEO — Tags:

One Reply to “Duplicate Content and Multiple Site Issues — SES San Francisco”

With the advent of Blogging, Google seems to be less aggressive with the duplicate content stuff. All we see is the oldest content is returned first in a search, and the duplicates will be further back in the search results.

Comments are closed

Serving North America based in the Los Angeles Metropolitan Area
Bruce Clay, Inc. | 2245 First St., Suite 101 | Simi Valley, CA 93065
Voice: 1-805-517-1900 | Toll Free: 1-866-517-1900 | Fax: 1-805-517-1919