Webpages with duplicate content are one of the major Search Engine Optimization (SEO) issues that are encountered nowadays. Indeed, it can be detrimental to your WebStore ranking in search results. Yet, it remains one of the toughest problems to solve for Webmasters. In this post, we will show you how you can leverage one of our recent patches to tackle this problem.
1. Duplicate Content Pages Consequences
Webpages with duplicate content cause at least three major problems to search engines:
- Since, they are potentially several versions of the same pages, it is therefore difficult for search engines to identify which version of the page should be indexed;
- Similarly, search engines cannot work out which version of the pages should be ranked in search results;
- Search engines will be confused as to whether to direct the link metrics to one page or keep it separated across multiple versions.
2. Types and Examples of Duplicate Content Pages
In this post, we will consider two types of pages with duplicate content:
- Any page that is queried using Google Adwords parameters;
- Pages that include product listing such as department and category pages.
In the first case, a link can look like this:
http://www.yourstore.com/dept?gclid=CKTf7smRu7sCFcEnpQodSDYAVA
As we can notice, this link is augmented by the Google Click ID parameter which originates from Google Adwords links. Another version of the same page can be viewed at this URL:
http://www.yourstore.com/deptand in this situation, the search engine will not know which version should be included in search results as previously stated. We will call this type of page with duplicate content a hard duplication.
In the second case, let us suppose without loss of generality that in a given department pages, the products are listed in several pages, with each assigned a number, in the following form:
http://www.yourstore.com/dept?page=1
Another example of hard duplication occur with persistent-filtered search pages. These types of pages were covered in this post.
Clearly, we need to tell to the search that those pages are in fact related to each other so that they can appropriated displayed in the search results. We will refer to this kind of duplication as soft duplication. The next section will show you how to address these two types of duplication.
3. Enters Our Solution
The central part of our technique to mitigate the problem relies on including some appropriate HTML tags in the header of each page in your store with potential duplication issues. These are HTML tags are precomputed and stored in NitroScript variables. All you have to do is to include those NitroScript variables in your header template so that those tags will be included in the final page.
In filtered search pages, you can specify the canonical URL by including the following code snippet in your header template.
{if (pageproperty['pageid'] eq 'filtered')}
{ifThereAre pfscanonical}
{forEach pfscanonical}
<link rel="canonical" href="{pfscanonical['url']}"/>
<meta property="og:url" content="{pfscanonical['url']}"/>
{endForEach}
{endIfThereAre}
{endIf}
To specify the canonical URL for pages requested by Google Adwords, you should include this code in your header template:
{if (pageproperty['crawled_parameters_canonical_url'])}
<link rel="canonical" href="{pageproperty['crawled_parameters_canonical_url']}"/>
<meta property="og:url" content="{pageproperty['crawled_parameters_canonical_url']}"/>
{endIf}
Finally, addressing soft duplication is done by including this code:
{forEach linkdata}
{if ((linkdata['rel'] ne 'canonical') || (pageproperty['crawled_parameters_canonical_url'] eq ''))}
<link rel="{linkdata['rel']}" href="{linkdata['href']}"/>
{endIf}
{endForEach}
To see those fixes in actions, if, for instance, you open a listing page, you should notice in the head of the resulting HTML that these tags were added:
<link rel="canonical" href="LINK TO THE PAGE"/>
If the page is potentially vulnerable to soft duplication issues, then you will notice even more tags in the form of:
<link rel="next" href="LINK OF THE NEXT PAGE"/>
These links are dynamically computed and are therefore not the same for a given pair of pages.
4. Bibliographic Notes
In this section, we provide links to webpages that we thought, would be of great interest to you.
- The original post by Matt Cutts, the head of Google’s Webspam team, on URL canonization
- An excellent blog post by Dr. Peter J. Meyers on duplicate content
- For SEO newcomers, here is an excellent PDF booklet on SEO 101
- Google SEO Starter Guide available at this address
We hope you have found this post useful, let us know your feedback.
This feature is currently only available in our Beta and Alpha channels.