The Issues Behind Duplicate Website Content

Casual business man hand, hipster, freelance browsing internet,  typing on keyboard, busy working on laptop computer, online working concept, close up, backlit, dark tone

To offer the best possible search experience to its users, Google tries it’s best to filter out as much duplicate content as possible – but why? Because we aren’t interested in seeing multiple search result pages filled with the same exact text.

Duplicate content can be defined as the publication of large blocks of nearly identical or matching content to multiple domains or other website pages. If you publish a piece of writing to one site or domain, and then upload the exact same content to another website, you will have created duplicate content.

From an SEO perspective, this process could be a cause for concern due to the way that the search engines are known to perceive and deal with issues such as duplicate content. It’s fair to say that, if your business website is having problems with duplicate content, this can have a negative impact upon the SEO performance of your domain as a whole.

When multiple copies of the same content are published to the internet, search engine bots are unable to determine which duplicate is most likely to be the original, as well as which is most relevant to the search being queried.

In these instances, the page with the highest authority is shown, even if it’s just a low-quality rehash of the original source article – this is decided by Google’s algorithm.

If your business website continuously republishes content from other sites, such as informative articles, press releases, new stories, and even product descriptions, your web pages will struggle to rank in the search results.

So, how do these duplicate content issues occur, and how can they be avoided?

How do Duplicate Content Issues Occur?

There are two main ways in which issues with duplicate content can start to develop. The first is through internet users simply cloning content by copying walls of text from one page to another, rather than working to put effort into creating unique text for their website pages. The second, and most common, cause of content duplication relates to various technical issues that could be occurring on your business website.

Content Management Systems creating Duplication

It is crucial to ensure that the content management system being used to management your website’s content is relatively easy to understand and operate, as these systems can sometimes be known to cause content duplication. For example, a content management system, or CMS, is able to display the same piece of content within many different formats. An example of this could be found within the blog posts being displayed upon a website’s blog homepage, buried in the archives of a page, in the blog’s category, and even under the author’s own archive.

Canonicalisation

Canonical issues are likely one of the most common causes of duplicate content creation. What may appear, at first glance, to be a set of URLs directing toward the same content could have the potential to look completely different from the perspective of the search engines. For example,

www.website.com

website.com

 www.website.com/index.php

website.com/home.asp

the above URLs will be viewed within the search engines as four separate web pages. This means that, unless redirects or the use of canonical tags are implemented, they will most likely be treated as duplications of one another, and therefore be considered to contain duplicate content.
This example is not just true of a website’s homepage, but can also occur on any other page that is able to be accessed via multiple different URLs. For example,

website.com/page

www.website.com/page

https://www.website.com/page

https://website.com/page

the above URLs could also be identified as duplicate content within the search engines, making it vital that your business works to identify and resolve any duplicate content issues that could be occurring on your website.
The implementation of 301 redirects is a common solution to this issue, and one commonly suggested by Google itself; along with ensuring the setup of a preferred domain in your search console, and relaying this within all of your internal linking.

URL Parameters

Another technical issue that could be the cause of duplicate content is the use of URL parameters. URL Parameters are parameters whose values are set dynamically in a page’s URL, and can be accessed by its template and its data sources.

For example, when a potential customer searches for “skirts” on a clothing website, they have the option of filtering or sorting their results by brand, colour, size, price etc. If the page displays 7 skirts, a different URL is generated when those items are sorted by price, as opposed to colour. In this scenario you end up with 2 pages with the same content at different URLs.

http://www.example.com/products/womens/skirts/black.htm

http://www.example.com/products/womens?category=skirts&colour=black

When Google detects duplicate content from pages like the examples above, it groups the duplicate URLs together and selects which might be the best URL to display in the search results.

To avoid URL parameter duplication, use the canonical link. You need to apply the canonical link (rel=’canonical’) to the initial category URL that has a sub-page. This will ensure that Google doesn’t pick up the duplicate content, and ensures that the link weight is all passed in the right direction.

Printer Friendly Pages

If you create printer friendly pages and link to them from your site, in most cases Google will find them and try to index them. The question is, which version will Google show? The one that is supposed to gain trust and authority, or the one displaying only your content? To avoid printer friendly pages, you should either make use of a print style sheet, or block them using your robot.txt file.

How Can you avoid Duplicate Content?

There are many quick and easy fixes for duplicate content, some of which are listed below:

  • Use 301 redirects from the ‘duplicate’ content page to the original content page
  • Use the rel=”canonical” link from the non-canonical page to the canonical one
  • Use noindex,follow meta tags to prevent pages from being indexed
  • Minimise boilerplate repetition
  • Syndicate content carefully

For expert advice on how to resolve duplicate content issues and improve your business’ SEO & website performance, get in touch with our Digital Marketing Agency in Birmingham and learn more about we can help.

Back to Blog