Like Me? Follow Me.
There are two previous posts on the I-COM blog here and here that talk about duplicate content in some detail. I won’t go too deep into the copywriting side of the issue, other than to reinforce the point that you should write your own content and never copy from another website or from another page from your own website!
As a refresher – duplicate content is essentially when the exact same (or very similar) content is on two different web-pages, either on the same website or otherwise.
The reason it is a problem in SEO terms is that the users, and therefore search engines, want to see unique content served up on websites, not the same stuff over and over.
Paul’s post went through the copywriting reasons, but I’ll be looking at the technical reasons why whole pages or even whole websites can be unknowingly duplicated and how it can be easily prevented.
How Does Page Duplication happen?
Duplication of content can happen in two ways – duplication of a chunk of content such as a paragraph or sentence, or the duplication of entire pages. The page duplication happens when a search engine crawls a website and finds more than one version of a page, based on its URL.
As a human user, we see a web-page as just that – a page to help us on our user journey and we are unlikely to care (or indeed probably even notice) if that page is on more than one version of a URL. On the flip-side a search engine sees a URL, meaning if there are multiple versions they will see more than one version of the page, and therefore duplicate content.
The Most Common Issues and Solutions
Some of the most common duplication issues, and solutions, are below:
WWW and non-WWW Canonical Issues
As I’m a big Festival fan I regularly visit the website eFestivals. When I type in their URL I just want to get to their website. As a user, I don’t care if the URL is www.efestivals.co.uk, or efestivals.co.uk without the WWW. However, search engines actually see those two URLs as different versions of the Homepage.
Now as the links across the whole site on each version are the same – either with the WWW on that version or without on the other version, in the eyes of search engines that is actually the same website duplicated twice.
The solution to this is simple – a website owner must simply decide on which version they want to use, which would usually be the with-WWW version as links are usually in this format. Once this decision has been made, their developers can apply a rule in their .htaccess file which will automatically redirect all pages to the correct version. They must also ensure all links within the site are pointing to the chosen version. This eradicates the duplication.
This is similar to the WWW and non-WWW above – inconsistent URL formats can also create duplicated pages. This can include:
- Extensions – such as .aspx, .html, .php etc - if a page is linked to using an extension from one link, and no extension from another it could create duplicated pages.
- Capitalisation – e.g. /SEO and /seo – if there is different capitalisation on two URLs that are otherwise the same, search engines will see them as being different despite it not making a difference to the user.
The easiest solution is to pick one URL format (all lowercase, either with an extension or without) and ensure that all links on the website use that one format. If an issue has been identified, the developers will be able to write rules to fix it.
Everyone has seen query strings on websites, they appear on the end of a URL usually after a question mark, and can have an important role to play in that they are actually serving up separate versions of a page based on parameters such as filters on an e-commerce site.
Here is an example on one of our client’s websites - http://www.ski-trek.co.uk/trek/outdoor-footwear/walking-boots-1.html?floor=723&price=1%2C100. This is the Walking Boots page http://www.ski-trek.co.uk/trek/outdoor-footwear/walking-boots-1.html, but with query strings on the end that narrow the products down to the Men’s section and the “up to £100” price range.
On some websites where SEO has not been taken into consideration, this can lead to many duplicated URLs. While search engines, and Google especially, are sometimes intelligent enough to work out it is a query string, there are still many occasions where the URLs end up in the index.
As our developers do take SEO issues into account when implementing a website, they used a tag on the query string version of the URL called rel=canonical. On a basic level this tells search engines that this is not the main version of the page and that, in the eyes of the search engine, the real version of the URL is the one without the query strings http://www.ski-trek.co.uk/trek/outdoor-footwear.html. While the search engines may have a grasp on your site, it is better to be safe than sorry.
While duplicated pages can be catastrophic to a website’s SEO performance, especially if it happens on a large scale, it is not too complicated to fix.
I-COM’s SEO Consultants and Website Development team work together to ensure that all websites we build take into account these issues and many others.