Duplicate content on WordPress – hey, Google indexed my tag pages!

On Facebook I’ve recently seen people ask about duplicate content on WordPress blogs. People are panicking. It’s logical to panic, unfortunately, because Google has been acting strangely lately. However, no-indexing ALL your tag and category pages is a mistake. You do want normal links between pages of your site. You do want Google to FIND your older posts. 

So here’s my recipe:

  1. Pick one way of categorizing your posts that’s most important: tags or categories. 
  2. Make sure this one is visible to users and search engines. In 9 cases out of 10 you do NOT need both.
    Having categories (usually your best pick) visible to search engines means that not only are all your posts visible to Google – as it crawls the category pages – you have the added advantage of on topic pages your users may particularly like. And – as a bonus – if you have a theme like Thesis you can add unique content on top of these category pages (explaining the category, for instance) to make these pages more attractive to search engines. 
  3. No-index and Follow the date based archives (never useful) and the tags/category system you’re not using. In many cases you also want the Author archives to be invisible to search engines. The easiest way to do this is using Yoast’s SEO plugin
    You do want to ‘Follow’ all links, btw, because that way when someone links to one of these pages, at least the pages it links to will get seen. 
  4. Have one big sitemap for your whole blog. Like this 
    That way you’re sure all your posts and pages are visible for Google and have top access to whatever pagerank your homepage has. Unfortunately I use a custom version of an old plugin that’s no longer being developed on this blog, so I can’t help you actually DO this. On some blogs I simply go with Simple Yearly Archive
  5. Use some plugin to interlink related posts. Right now I prefer nrelate.
    This is another way to make sure Google knows what your site is about and get individual posts to rank. 

Now here’s my explanation

I understand worrying, as I said. However, that doesn’t mean we have to avoid the most basic common sense approaches to website  building and WordPress blogging. Google has given us some very clear guidelines about how they want us to build websites. Here are a few:

Build for visitors first. Show visitors and search engines the same thing.

This means that if you have useful categories and tags, do show them to your users and do show them to search engines as well. If in doubt: make them MORE useful. (See the bonus with point 2 above). 

Links are the fabric on which the web is built.

The only links you have legitimate control over these days are the links WITHIN your website. Hiding tags and categories from search engines means that you’re saying to Google: don’t trust my site. In effect you’ll be linking from your homepage ONLY to the pages/posts visible on your homepage. That means only your last 10 posts and whatever posts and pages you have listed in your menu. The rest will be INVISIBLE to Google. NOT a good idea. 

So how about duplicate content?

Let GOOGLE worry about that. Seriously. Yes, the snippets from your posts will show up on your category and tag pages. As long as your posts are generally longer than that, the original post is still likely to be seen and indexed by Google. After all, the category and tag pages LINK to that original post. 

So how about duplicate content?

Still worried about that? Well, with tag pages there is some reason to worry. If you have posts about gadgets and they’re all tagged both ‘gadget’ and ‘electronics’ the tag pages for ‘gadget’ and ‘electronics’ are going to be pretty much the same. Assuming Google sees that – and was going to rank one or the other TAG PAGE for something – it would be a problem for them to be identical. You’re splitting pagerank between the two TAG PAGES which makes it less likely either is going to be seen by search engine users. This is a problem with your tagging. Most people tag very inexpertly anyhow (which is part of why I recommend most people to use categories instead) and yes, if you tag like that, you probably need to hide your tag pages from Google. 

So I should really NOT hide all my tags/ categories etc. from Google?

Really. You should at least have EITHER your tags OR your Categories visible to Google. Google can figure it out. Most (genuine) blogs online don’t worry about this stuff. Out of ignorance. A few are still getting bucketloads of traffic from Google or wherever. And yes, we do have Matt Cutts’ word on this. He notes that 25% of the web is duplicate content. And that’s normal. More from Matt on this issue: rel=canonical and duplicate content won’t hurt you, unless it’s spammy.

Post Script: what about the date archives?

Well, I did say in a byline above that you should hide them from Google and Users. Honestly, I forgot about them. The first thing I do when setting up any WordPress site is deleting the date-based-archive widget. Unless your blog is purely personal, I don’t think organizing by date is EVER as useful to readers as by organizing by topic. I mean, who – except perhaps my mom – would be interested to know what I wrote in August 2013? You’re interested to know what I wrote about Squidoo, or WordPress or SEO recently. If you have any readers at all, they’re interested in your topic. Even the New York Times organizes by topic first, date later. And WordPress is organized by date by default anyhow, you don’t need specific date-based archives. So hide them from users and search engines. And that means no-index / follow. 

2 thoughts on “Duplicate content on WordPress – hey, Google indexed my tag pages!”

  1. Thank you for this clarification, Katinka. I have to admit that Kathy’s thread was started because of a conversation her and I had earlier on yesterday after I was listening to an SEO expert talking about the duplicate content that shows up when you do a site:yoursiteurl check. It makes more sense to me now when you point out that both tags and categories are not required as categories will suffice. Writing a unique description for the category pages will also help significantly for the duplicate content problem.

    1. I think the problem is that when you take SEO advice out of context and simplify it, the point often gets lost. Do remember that even if you use only the categories, you do need to no-index the tags (and author and date archives) because they still exist – even if nobody gets to see it. And if they exist people can link to them.

Comments are closed.