What Is Duplicate Content?
When people with a limited understanding of SEO hear the phrase “duplicate content,” shivers run down their spine. If that’s the case, rest assured. Google does not penalize companies for having duplicate content on their websites, except in those very rare instances where black hat SEO practitioners create it to fool Google.
What Is Duplicate Content — How It Comes About
Duplicate content is a naturally occurring phenomenon in websites. There are many valid reasons for having duplicate content on your site, including:
- HTTP and HTTPS versions of a URL — this results in duplicate content as far as Google crawlers are concerned, even though human readers might never notice.
- www and non-www versions of a URL.
- Internal site navigation with internal search that allows users to sort products by size, color, etc. This creates multiple versions of the page depending on how the user sorts the data. On large e-commerce sites, there could be hundreds of essentially duplicate pages for a single product group.
- Different URLs that are driven by tracking software. If, for instance, a company wants to know which online ads are sending users to the website, it will create a unique URL for the website’s landing page for each ad source.
- Printer-only and mobile versions of a web page.
- Versions of a web page in different languages.
- Content that is scraped (with or without your permission) or reproduced (with or without your permission) on other websites and blogs. Normally Google strives to rank the original version of the page, but these situations muddy the waters if it happens a lot.
All of these situations are valid reasons to have duplicate content. Keep in mind, too, that content doesn’t have to appear in the same order on the page for Google crawlers to consider it duplicate — important for e-commerce businesses to understand in terms of the sorting issue described above.
SEO Problems With Duplicate Content
Even though you won’t be penalized for having duplicate content, it can cause problems for your SEO results.
- If Google can’t figure out which version of a page to display in its search results, it may not rank any version as well as it deserves to be.
- Another problem is that Google may show a less-than-ideal version of a page to users, which could severely reduce click-throughs and conversions.
- Google may have a hard time figuring out which version of a page should get “credit” for the external and internal links pointing to it, or whether the “credit” should be split among the various duplicate pages. Inbound links are very important for rankings, so sharing credit for links on the version of the page you really want Google to rank could drop its ranking position significantly.
The real question from an SEO perspective is not what is duplicate content, but rather: What can we do about duplicate content?
SEO Fixes for Duplicate Content
Sometimes, Google can figure out which version of a page to rank without any action on your part. Sometimes, you can tell Google which version to rank. All of the SEO workarounds for duplicate content have advantages and disadvantages, so your best move is to partner with a firm with a thorough understanding of web development and SEO.
In general, your best options to thwart duplicate content issues in your SEO campaign are:
- Use Google Search Console to communicate with Google. For instance, you can tell Google to consider only https or www versions of URLs.
- When you publish content off-site, include a link back to your original content.
- For sorting issues and related problems, Google has what is called a URL Parameters tool to block specified versions of a page from Google crawlers.
- Create what is called a “canonical link attribute” for the version of the page you want Google to rank — this command tells Google to ignore other pages with duplicate or similar content.
- You can give certain versions of a page a “noindex” tag, which discourages Google from crawling it.
- Identify and evaluate all duplicate content on your website. Is it really necessary? Are there better options for setting up internal search or tracking to cut down on or eliminate duplicate URLs? This exercise can be quite fruitful not only for improving the SEO campaign, but also for improving the user experience, by cutting down or getting rid of incomprehensible URLs and other confusing complications.
Seldom is it a good idea to block Google from crawling duplicate versions of web content. Doing this makes it difficult for Google to evaluate the true importance of your main page, or to understand how various versions of the page relate to different search queries. Google’s technology is quite sophisticated in assessing duplicate content, and usually does a good job of picking the right version for a given search.