Understanding Duplicate Content and How to Avoid It
Even if it’s not done on purpose, duplicate content can hurt your optimization efforts and taint the search experience for your customers.
At Bruce Clay, we recommend watching out for duplicate content as a way to create a better user experience for your users and help your visibility in search.
Since Google is smarter today than it’s ever been, it knows how to spot deceptive practices versus a lack of SEO skills.
With this in mind, a solid understanding of this issue is essential to SEO. When you avoid or repair duplicated pages, your customers get to see the content you want them to see. Plus, you get to communicate to Google that you’re not being deceptive.
To help you get started, we define duplicate content, clarify the two types according to Google, and share the consequences of each. Then, we’ll show you how to spot 10 specific causes and resolve them.
Table of Contents
- What is duplicate content?
- Is duplicate content bad?
- Is there a penalty?
- Consequences of similar content and plagiarism
- 10 common issues and how to fix them
What is Duplicate Content?
Duplicate content is a term used by search engines such as Google to describe two main types of content issues with websites:
- Sites with many pages of identical or similar content.
- Sites that feature plagiarized, or scraped, content from other sites.
Google defines duplicate content this way:
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar.
Is Duplicate Content Bad?
Duplicate content is not as big a problem as it once was. Still, it can affect both the search experience and your SEO. Left alone without proper justification, yes, duplicate content can be bad. How bad the consequences may be depends on the type of issue you’re dealing with.
Is there a Duplicate Content Penalty?
No, Google does not have a duplicate content penalty. The search engine says:
Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.
Further, Google recognizes that duplicate content that occurs within a site is “mostly not deceptive in origin.”
Consequences of Duplicate Content
Since there are two main types of duplicate content, we’ll discuss both and how they might affect SEO and the search experience for your customers.
Type 1: Pages of Similar or Identical Content within a Site
If your site features pages of largely identical content, Google picks the best page for the search results. When this happens, the webpages deemed duplicate get filtered from the SERPs.
While Google finds this helpful to searchers — because it wants to show pages with unique information — the opposite might be true for your business. You might not agree that the page Google picks to show for a query is the best one. And your customers might be missing out on that one page they’re looking for.
For example, an ecommerce site might have several URLs for “boys ski jackets” — perhaps a category page for boys outerwear like “jackets – ski,” another for “ski clothes – jackets – boys,” and so on. If a site has faceted search options (such as a filter menu down the left column for Brands, Styles, Colors, etc.), different pages can result that really have the same contents.
When all of these pages look the same, Google thinks that’s fine. However, only one will make the cut. The other variations will be filtered from search results.
If your site suffers from duplicate content issues, … we do a good job of choosing a version of the content to show in our search results.
Type 2: Scraped or Spam Content on Different Sites
On the other hand, scraped content is considered spam and falls into the second duplicate content category. Sites with scraped content could be impacted by a manual penalty from the search engine.
Or it could be impacted by search engine algorithms that target low-quality content and demote it or adjust rankings down.
Common Duplicate Content Issues and How to Fix Them
Before we dive into specifics, this video overviews how to resolve dupe content issues on your website.
Let’s go over some scenarios that might cause issues on your own website. Please note this is not an exhaustive list but addresses today’s most common issues leading to duplicate content on your site.
- Two site versions
- Separate mobile site
- Trailing slashes on URLs
- CMS problems
- Meta information duplication
- Similar content
- Boilerplate content
- Parameterized pages
- Product descriptions
- Content syndication
Issue 1: Dueling Versions of Your Site
You might create two copies of your site in the search index if you haven’t told search engines like Google which version of the site you want indexed — the www version (for example, www.bruceclay.com) or the non-www version (bruceclay.com).
The same can happen if you have two copies of your site via http:// and https://.
Here’s how to handle it: You can open Google Search Console and take care of this in the settings. The more popular option is usually redirecting the non-www version to the www version. You’ll also want to add a domain-level 301 redirect from one version to the other.
Issue 2: Mobile Sites and Duplicate Content
Some sites have a separate mobile site (versus a responsive site, which is recommended and avoids duplicating content), and this requires maintaining two separate websites with different URLs. If you’re in this scenario, you probably have similar or identical copies of your pages.
Here’s how to handle it: Ideally, a separate m-dot site should be converted to a responsive design. If that is not possible, then set up a <link> tag with rel=”canonical” and rel=”alternate” elements to tell Google the relationship between the two versions of your pages. Make sure you redirect correctly using Google’s guidance here.
Issue 3: Trailing Slashes on URLs
When you have a trailing slash at the end of a URL and the same page exists under a URL without the slash then you are essentially creating two pages.
For example: www.bruceclay.com/blog/duplicate-content/ versus www.bruceclay.com/blog/duplicate-content
Here’s how to handle it: Like the www vs. non-www issue, you’ll want to pick the preferred URL format and stick with it. Then 301 redirect the duplicate URLs that exist to the preferred URL. Consistency is key, so also make sure your internal navigation links point to the correct URL versions.
John Mueller of Google sent a tweet with a handy chart to summarize when trailing slashes matter:
I noticed there was some confusion around trailing slashes on URLs, so I hope this helps. tl;dr: slash on root/hostname=doesn’t matter; slash elsewhere=does matter (they’re different URLs) pic.twitter.com/qjKebMa8V8
— 🍌 John 🍌 (@JohnMu) December 19, 2017
Issue 4: Duplicate Content from Your CMS
Your content management system (CMS) may be creating duplicate content. For example, some ecommerce platforms create URLs with product categories that can cause duplicate content issues.
Here’s how to handle it: Some CMSs inherently create content problems that can’t be worked around. In other cases, depending on how the content is being duplicated, you can take steps to improve the situation. For example, this Search Engine Land article gives advice on how to tackle duplicate content in Magento.
Issue 5: Meta Information Duplication
The meta information on a page (title, description) is one of the first blocks of text content that a search engine encounters. When you have multiple pages that have the same or similar meta information, they can look like duplicate content.
Here’s how to handle it: Ensure each of your pages has a unique meta title or description if possible. If you’re on a WordPress site, the Bruce Clay SEO plugin has a duplicate content checker that can alert you when pages have identical meta information.
Issue 6: Similar Content
Similar content refers to pages on your site that cover the same topic in different ways. Search engines like Google may not consider this duplicate content, per se. But it will choose which page should be displayed in the search results (per query) and filter out the others. You won’t see all of them competing.
Here’s how to handle it: Do an audit of the pages on your site that are topically the same. Find out which are already ranking and getting traffic. Then consider combining content (and doing a quality edit). Fold some of those non-performing pages into the pages that are already performing (with a 301 redirect).
Issue 7: Boilerplate Content
Boilerplate content can include text that is the same across every page. For example, certain industries have legally required disclaimers that have to be shown on every page. Or you might have terms and conditions text.
Google understands that this type of boilerplate text may be required and does not count it against a site. This is especially likely for YMYL (Your Money or Your Life) types of pages. However, you still need content that is unique to provide value to your users and make your page stand out in search.
Here’s how to handle it: If possible, create individual webpages for all your boilerplate content. Then create a link to those pages on the site, for example, in the footer.
Issue 8: Duplicated Pages with Parameters
Some websites have many versions of pages because of parameters, which are codes appended to the end of a URL. For example, different product colors or sizes may serve the same page with just slight variations. Or a user’s session ID may be appended to the URL as a parameter. When these are used in links to the site, search engines may find and index the duplicate versions.
Here’s how to handle it: Google recommends that you block crawling of parameterized content using the Parameter Handling tool. This lets you specify how you want Google to treat URL parameters on your site.
Issue 9: Product Descriptions
Using manufacturer descriptions for product content can create identical copy issues. Those same paragraphs may be used on sales pages across hundreds of websites.
Search engines like Google may expect that product descriptions will be the same or similar. But if your pages do not give any unique value to searchers, they will be filtered out of search results.
Here’s how to handle it: If Google expects this kind of thing, you’d think there would be no problem. But it is best to either rewrite the product descriptions to make them more unique or add at least 200 extra unique words on the page to demonstrate expertise and give more detail on the product.
This can be tedious work, so prioritize your most profitable product pages and work your way through the list. We have seen this type of content investment yield huge SEO gains. For more details, see our article on thin content.
h3>Issue 10: Content Syndication
When you syndicate your content across other authoritative sites, the site that ranks for your content may not be your own.
The latest version of Google’s Search Quality Evaluator Guidelines (December 2019) has this to say on Page 40:
We do not consider legitimately licensed or syndicated content to be “copied” (see here for more on web syndication). Examples of syndicated content in the U.S. include news articles by AP or Reuters.
In other words: content syndication has its place. This article has a lot of good information on understanding syndication.
Here’s how to handle it: The easiest way for your content to still benefit from SEO when syndicated on other sites is to implement rel=”canonical”. This can pass the PageRank from the syndicated source to the original source: your content.
When that is not allowed, Google suggests these remedies:
- Block a page from being indexed by including a noindex meta tag.
- Add a link back to the original article within the body of the syndicated article.
Scraped Duplicate Content Issues and How to Solve Them
When websites copy the content of another site, this is often referred to as scraping. Many reference this as a form of duplicate content but in reality, it is spam and some old-fashioned plagiarism.
In this video, Google representatives address duplicate content as spam.
You can find out if your website content exists elsewhere on the web by using a plagiarism checker tool such as CopyScape.
You can also do a search for parts of the content with and without quotes in Google, for example, to discover duplicate content.
Note that Google’s Mueller says that scraped content shouldn’t be a problem unless the other site’s page is ranking for queries you care about.
Here’s how to handle it: If your content has been scraped, follow these steps:
- Check if the page gives credit to your site. It might have a noindex directive on the page; a canonical attribute pointing to your original content; text saying that it was published on your site; or a link. If so, then you might not have to do anything.
- If the page does not give you credit, then contact the webmaster to request they take it down. Within the U.S., there’s a law you can reference called the Digital Millennium Copyright Act. Sometimes it takes some persistence.
- File a takedown request with Google (more details here).
- If the problem is rampant (in other words, lots of sites have your content indexed at this point), then rewrite your own content to make it unique and even better than it was before.
- You might also consider WordPress plugins that can help combat scrapers on an ongoing basis.
If your site is the one that appears to have copied content from another website, you may experience ranking problems.
As mentioned earlier, your page will probably be filtered out of search results. You might get a manual penalty from Google (especially if the issue is widespread on your site) or even be dropped from the index (an extreme case). Regardless, it would not reflect well on your site’s expertise, authoritativeness and trustworthiness (Google’s E-A-T indicators of a quality site).
In this instance, it’s best to remove the spam content and then create unique, original content. With a manual penalty, you would need to submit your site for reconsideration after you have made those improvements.
Having a good understanding of duplicate content is the only way to prevent it and repair any existing issues on your site.
If you’d like help identifying the issues hindering your website, contact us today for a consultation.