What is Duplicate Content & How To Avoid It!

blogplaceholder

To offer the best search experience to a user, Google tries it’s best to filter out as much duplicate content as possible – Why? Because we aren’t interested in seeing multiple search result pages with the similar or exact text.

Duplicate content is similar or identical content which appears in multiple locations on the Internet i.e. across several websites or web pages.

When there are several copies of the same content on the Internet, the search spiders become confused and don’t know:

  • which version is most likely to be the original—or best for a given search query
  • don’t know which page to direct the link metrics (trust, authority, anchor text, link juice etc.)

Therefore, the page with the highest authority / highest trust is shown even though it may not be the original source – this is decided by Google’s algorithm.

If you continuously republish content from other sites such as posts, press releases, new stories or product descriptions your web pages will struggle to rank in Google’s SERPS and you may be hit by the Google Panda Update.

What are the causes of duplicate content?

There are several reasons that cause duplicate content. Here are a few reasons:

URL Parameters
One of the causes for duplicate content is the use of URL parameters. URL Parameters are parameters whose values are set dynamically in a page’s URL, and can be accessed by its template and its data sources.

i.e.

A customer searches for “skirts” on a clothing website; she has the option of filtering or sorting her results by brand, colour, size, price etc. If the page displays 7 skirts on it, a different URL is generated when those items are sorted by price as opposed to colour (for example) then you essentially end up with 2 pages with the same content at different URLs.

http://www.example.com/products/womens/skirts/black.htm

http://www.example.com/porducts/womens?category=skirts&colour=black

When Google detects duplicate content from pages in the example above, it groups the duplicate URLs together and selects which might be the best URL to display in the search results.

To avoid URL parameter duplication, use the canonical link. You need to apply the canonical link (rel=’canonical’) to the initial category URL that has a sub-page. This will ensure that Google doesn’t pick up the duplicate content and ensure that the link weight is all passed in the right direction.

Printer Friendly Pages

If you create printer friendly pages and link to them from your site, in most cases Google will find them and try to index them. The question is which version will Google show? The one that is supposed to gain trust, authority and link juice or the one with just your article? To avoid printer friendly pages, you should either use a print style sheet or block them using a robot.txt file.

 

How to avoid duplicate content

There are some basic fixes for duplicate content. Some of them have been listed below:

  • Use 301 redirects from the ‘duplicate’ content page to the original content page
  • Use the rel=”canonical” link from the non-canonical page to the canonical one
  • Use noindex,follow meta tag to prevent pages from being indexed
  • Minimise boilerplate repetition
  • Syndicate content carefully

For more information about duplicate content or if you are interested in Digital Marketing, please fill out the contact form or call In Front Digital on 0121 454 0279.

 

Back to Blog