WordPress SEO Issues – Managing Duplicate Content Effectively
This is the fifth part of An Essential Guide To WordPress SEO – so far we’ve learned about the importance of keyword research, getting high authority backlinks, and a number of techniques that can be used to make your WordPress site more attractive to the search engines.
In this post I want to focus on something that can significantly influence your rankings (in a negative way) – duplicate content! A standard WordPress installation doesn’t deal with duplicate content particularly well and it’s something that you’ll need to address to minimize the impact on your rankings.
What is Duplicate Content?
This might seem obvious, but it’s worth initially clarifying – by duplicate content, I mean that you have content on your site that appears in more than one place on the Web. This content could be on your own site or on a different one – either way, it’s not ideal from an SEO perspective and your rankings could suffer if the search engines believe that you have duplicate content.
How Does This Affect My WordPress Site?
Duplicate content is a very common issue with WordPress blogs – this is primarily because of the way WordPress structures your site. The ability to easily categorize your posts and pages, add tags, and have archives of everything makes it much easier to manage your content and enhances the interaction experience for your readers.
However, there is a downside – it also means that multiple copies of your content is stored at a number of different locations. For example, consider that you have just published a blog post called “10 Great WordPress Themes” – you add it to a category called “WordPress Themes” and you also add a couple of tags (e.g. “themes” and “review-articles”). This content will then appear in (at least) the following places on your site:
http://www.yoursite.com/blog/10-great-wordpress-themes [your original post]
http://www.yoursite.com/blog/wordpress [the category page your post is stored under]
http://www.yoursite.com/blog/themes [the first tag you attached to your post]
http://www.yoursite.com/blog/review-article [the second tag you used]
http://www.yoursite.com/blog/2009/12/04 [the daily archive for your site]
http://www.yoursite.com/blog/2009/12 [the monthly archive]
http://www.yoursite.com/blog/2009/ [the annual archive]
http://www.yoursite.com/blog/author-name [your author archive]
As you can see, this is a lot of duplicate content and it may well be affecting your site right now! This problem can be particularly damaging if you use lots of tags when writing your posts as your content will appear on a separate page for each tag that you use. However, there are steps we can take to address this and get everything nice and tidy …
Meta Tags
Meta nofollow and noindex tags are typically found in the header.php file of your site’s theme – they often look something like the following:
<meta name="googlebot" content="index,archive,follow" /> <meta name="msnbot" content="all,index,follow" /> <meta name="robots" content="all,index,follow" />
These tags inform the search engines (in particular Google and MSN) that the current page can be processed and added to their index, and that all links on that page can be followed. This is exactly what we want for our blog posts – that is, we want the search engines to index our posts so that we can start getting some traffic. However, we probably don’t want all the category, tag, and archive pages to be archived as well, as these will potentially be recognized by the search engines as duplicate content. We can stop this from happening through the use of a PHP conditional statement and by changing the meta tag values – to do this, place the following code into your header.php file:
<?php if(is_single() || is_page() || is_home()) { ?> <meta name="googlebot" content="index,archive,follow,noodp" /> <meta name="robots" content="all,index,follow" /> <meta name="msnbot" content="all,index,follow" /> <?php } else { ?> <meta name="googlebot" content="noindex,noarchive,follow,noodp" /> <meta name="robots" content="noindex,follow" /> <meta name="msnbot" content="noindex,follow" /> <?php } ?>
What this code is essentially telling the search engines is that if the current page is a single post page, a standard page, or the home page, then it’s absolutely fine to crawl the content and follow all links on that page. However, if it’s any other type of page (i.e. the category, tag, or archive pages) it’s mostly like duplicate content so leave it alone – ALTHOUGH still allow the links on that page to be followed.
That last bit is an important point – we want to allow the links to be followed on the duplicate content pages as they will still be seen as internal links and that is certainly something that can help with your site’s rankings.
Canonical Meta Tags
Canonical tags are useful for resolving duplicate content issues – these tags are supported by the main search engines and enable you to tell them which URL they should treat as the actual address for your content. But when might this be useful? Well, for example, assume that you have a product you’re selling on your site and that you’re allowing affiliates to sell your product for you in return for a commission. Each of your affiliates is given a unique link to your product so that you can track who has referred sales to your site. In such a case, you might have lots of different addresses pointing to the same product page:
http://www.yoursites.com/product-name.php http://www.yoursites.com/product-name.php?affiliate-id=123 http://www.yoursites.com/product-name.php?affiliate-id=456 http://www.yoursites.com/product-name.php?affiliate-id=789
If the search engines follow one of those links back to your site they may see it as duplicate content (because you have the same content at two different web addresses). The way to resolve this with canonical tags is to add the following code to the <head> section of your header.php file:
<link rel="canonical" href="http://www.yoursites.com/product-name.php"/>
This tells the search engines that the URL you’ve specified is the original address for that product – however, it’s important to note that canonical meta tags are basically your “suggestion” to the search engines as to which page should be indexed. It’s likely that the search engines will act on your suggestion, but this is by no means guaranteed.
You might want to check out this Canonical URL plugin which makes it very easy for you to manage your canonical tags.
Make Use of Excerpts
Another technique that many people use for addressing duplicate content is to use excerpts instead of the actual post content on all pages where duplicate content will potentially exist. This is a simple, but quite effective strategy – all you need to do is edit the files where your content is displayed (except single.php and page.php) and change the template tag <?php the_content(); ?> to <?php the_excerpt(); ?>. This means that only a subsection of your posts will be displayed on the pages where duplicate content would otherwise exist.
Conclusion
Addressing potential issues with duplicate content is one of those things that’s easy to put off – however, it really is worth taking a bit of time to sort things out. Applying the tips in this post will help you tidy up any issues and will minimize the risk of the search engines penalizing your site. In the next post I want to highlight some useful WordPress plugins that can help you with your SEO efforts – they’re all free and can help with some of the tedious tasks that need to be completed when ensuring a site is optimized for the search engines.
Table of Contents
1. Introduction and Guide Overview
2. Researching Great Keywords To Help Drive Traffic
3. 10 Ways You Can Generate Quality Backlinks To Boost Your WordPress SEO
4. Ten Quick Power Tips To Drive More Visitors To Your WordPress Site
5. WordPress SEO Issues – Managing Duplicate Content Effectively
6. Top 8 Free WordPress SEO Plugins
7. 8 Tips For Promoting Your WordPress Site
8. Great Tools For Analyzing The Impact Of Your WordPress SEO
















December 4th, 2009 at 6:00 pm
This article has been shared on favSHARE.net. Go and vote it!
December 4th, 2009 at 6:56 pm
Great post and tip, but I have one helpful tip.
You have is_single() || is_page() in the same if statement. This same thing can be accomplished using the is_singular() function. I have used this several times in my themes and makes it simpler to know what your are checking for and why you are checking.
Again, great tip!
December 5th, 2009 at 5:10 am
Thanks Matt – good suggestion
December 4th, 2009 at 11:37 pm
Hi Chris,
very informative post, however does not the use of all-in-one-seo, or any of the other popular seo plugins, now deal with this issue of duplicate content..? Also with regard to category archives, what is the difference between having a post linked to from another page or post (something that is recommended), to having a category archive ..? For example, lets say I use 5 categories A,B,C,D,E each of these archive pages is going to show a totally different set of posts obviously. So are you saying that the fact the original article or post is being shown in a archive list is going to be classed as duplicated content..? If so then surely the same must be true of every internal link we use within our blogs. I have looked at my category archive pages & to be honest, I cannot find another page which is the same anywhere on my blog, yes the excerpts shown link back to the original posts, but surely this is not a problem…?
Regards Steve
December 5th, 2009 at 5:05 am
Hi Steve,
Thanks for your comment – yes, the All In One SEO Pack can help deal with the issue of duplicate content (I’m covering SEO plugins in the next part of series) – it provides specific options that allow for you to use “noindex” on all of your archive pages. In this post I wanted to discuss the issue more generally and touch on the sort of code that needs to be generated in order to deal with the problem. This hopefully gives people a better understanding of the issues (if that’s what they want), although I fully understand that many people would prefer to use a plugin that takes care of everything for them.
With regards to your second question about links on category archives – it’s not so much an issue of the link on archive pages being a problem (as this points to the original URL), it’s if some people have the full content for each post displayed on those archive pages (in addition to the link). In such cases it’s best to ensure that the excerpt is used as this will only include a subsection of the post, as opposed to the full content (sounds like you’re already doing this). The code in this post is useful as it tells the search engines not to crawl the archive pages, but it does still allow for the links to be followed (which is potentially advantageous from an SEO perspective).
Hope this helps …
December 5th, 2009 at 1:49 pm
Wow, what a nice article. Thank you Chris. This guide will surely help me get rid of those duplicate contents.
December 5th, 2009 at 1:54 pm
Very detailed and informative. I like it. Thanks Chris!
December 7th, 2009 at 6:33 am
Great post. It was one of the reasons I lost my rankings from my blog, at least now I know. Thanks for sharing
December 9th, 2009 at 5:57 pm
Thanks Chris – glad it was useful.
December 9th, 2009 at 1:06 pm
Thanks, Chris! I’ve already put the meta tags to use in the header. Much appreciated.
Could I (should I?) use canonical tags to tell Google to treat all the sub-domains on my blog as a part of the main url – or is there a better way to do this?
December 9th, 2009 at 6:22 pm
Hi Amber – I guess the question of whether you should do this depends on whether or not you have duplicate content on your sub-domains. If you don’t, then it might not be worth doing – although I assume you probably do given that you asked the question
I’ve not tested it myself, but I believe canonical tags should work for sub-domains. However, if you have any links/PageRank for your sub-domains it might be worth using a 301 redirect to tell the search engines that you want to *permanently* redirect the sub-domains to your main domain (this will help to keep the benefits of the links and maintain PageRank). There are plugins that could potentially help with this – for example, Redirection
December 10th, 2009 at 1:30 am
The sub-domains are divisions of my blog, and all posts are syndicated on the front page. I did make the syndications post as excerpts so that should take care of the duplicate content issue. However, as individual articles get linked to, it raises the page rank of the sub-domain only. (Or at least I assume – they don’t all have the same page rank right now.) I was hoping they could all be “linked” SEO-wise to the main domain so that any links to any sub-domain would help the page rank of the main domain. Is that even possible? That’s what I was hoping to accomplish with the canonical tags.
December 10th, 2009 at 4:27 am
Canonical tags are primarily for dealing with duplicate content – what you’re talking about doing here is PageRank (PR) sculpting which is quite an advanced SEO topic. It is certainly possible to pass PR around a site – however, it may not always be best to focus your attention on your site’s homepage – especially for a WordPress site. This is because a blog homepage is dynamic and constantly changing – instead, you might want to focus more on your posts and static pages and optimize these for some specific keyphrases. Having a higher PR for these pages is good because it means you’re more likely to rank well in the search engines for the keywords you’ve targeted and as a result may drive more traffic to your site. However, as I say, there are lots of differing views on this – you might want to check out this interesting article on Page Rank Sculpting to learn more. Hope this helps
December 10th, 2009 at 1:01 pm
Thank you so much for all your help! I see your point. I’ll read that other article, too. Thanks again!
December 10th, 2009 at 3:58 pm
No problem Amber