What Is Duplicate Content And How Do I Avoid It?

24 Feb, 2021

Preventing Duplicate Content & SEO Problems

We know that having the best approach to as many of the most important ranking factors as possible is key to getting those top rankings on Google but It’s also important to remember that not all ranking factors are important in the same way — some take precedence over others. Having a good strategy for dealing with duplicate content is one of these; someone posting an article that is copied word for word from another site, doesn’t need to worry about how many keywords are placed in the article’s meta description because this article will almost certainly never be seen by anyone on search engines. Google will spot the plagiarism and won’t rank the article. This is just one type of duplicate content that most people are familiar with; there are many other ways duplicate content and content similarities can create challenges for a Digital Marketer; read on to find out more about them and the best ways to deal with duplicate content.

Google Deceptive Duplicate Content

According to Google duplicate content refers to:

substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.

Note that Google makes a distinction between content that is deceptive and not deceptive. Deceptive content refers to a site that is misleading the user in some way, they could be passing off stolen content as their own or even using it to make the user believe that a website is a business that it is not. Unsurprisingly, Google is particularly unforgiving of these kinds of duplicate content stating they will

“make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.”

The likely ‘adjustment’ for stolen content will be banishment of the offending articles from Google's results. So until you’re clear on the rules of duplicate content the other ranking factors can wait.

Keeping Copied Duplicate Content off Your Site

The solution to issues from blatant copying of other site’s content word for word is simple: just don’t do it. You should never copy complete articles from other sites and try to pass it off as your own work. The industry term for this is scraping; it's very much frowned upon by Google who sees them as adding no value to the user and most likely committing copyright infringement. Google can detect this duplicate content and will not give it the same ranking value as the original content. Most likely the search engines already know the article because the scraper found it through Google.

You probably already knew that scraping was bad, so a more interesting question is what about the grey areas? After all, everyone borrows a little bit from here and there for inspiration. If you are reformulating ideas from broader more generally understood topics in which most opinions aren’t original, then you are less likely to have problems. When writing an article you may lose track of the content you have written and content that is merely there for inspiration, If you are not sure if enough of your written text is original, a provisional way to check is to paste sections of the text into Google to see if the article that influenced it comes up in the results. If it does and the majority of the words in your article are highlighted then the article is too similar and needs to be changed. Significant paraphrasing will solve most issues with duplicate content but it won’t necessarily lead to good rankings.

The Issue of Similar But Not Duplicate Content

At this point, there is a split in approaches to duplicate content. Inexperienced content creators, who are stretched for time, will find top-ranking articles reword them so they are no longer duplicate content and post them, but these creators are missing the point. Even if they aren’t technically creating duplicate content, just because they aren’t breaking any rules doesn’t mean they aren’t doing something wrong. From a Digital Marketing perspective; It’s like the difference between breaking the speeding limit and driving with your car in the wrong gear. It’s not illegal to always drive around in second gear but you certainly won’t get the best out of your car by doing so.

The top-ranking version of an article on Google has been selected by Google to be the best answer for that query and the best version of that idea. A reworded version of that article is neither a better version nor an interesting alternative. So it won’t really have a chance of getting the top rankings. It is not enough to create content which isn’t technically duplicate content, you also need to create content that offers something extra for the user. Those of you who find the prospect of creating new content daunting don’t worry, you don’t have to reinvent the wheel, it can be as simple as summarizing and comparing ideas from multiple articles in your own words.

Next time you are looking for inspiration for an article don’t just find one source for inspiration, try to find at least five sources and write about how you agree or disagree with the articles and how their ideas connect, by doing this in your own words, you are creating something new and more valuable for users and Google that has a much better chance of ranking.

URL variations

It’s not just the content that can create duplicate content. Duplicate URLs can also cause duplicate content problem lookout for issues with

Click tracking
Analytics code
Printer-friendly page
versions
HTTP vs. HTTPS or WWW vs. non-WWW pages

Duplicating Your Own Content

Your website can also create issues from having too much of its own content that is similar. Doing so will not raise issues due to theft of intellectual property but it can diminish the chances of these pages ranking for the terms they are chasing: When Google finds two pages on your site that are "appreciably similar", and possibly chasing the same search term, Google can have trouble deciding which page should be the authority for that term. This can result in the rankings being split between the two pieces of content or one page being prioritized over the other. Neither of these outcomes is maximizing the utility of your content.

Avoiding Own Site Duplicate Content

Having a content plan which identifies the keyword terms each article is chasing is an important way of making sure your content isn’t competing against itself. If you haven’t kept track of all your previous posts, and there are quite a few of them in different parts of your site, you can get a good overview of your pages by logging into a Google Analytics account that has been linked to your website, and clicking

- Behaviour

- Site Content

- All pages

This will give you a useful list of your site's pages along with some interesting audience statistics.

rankingCoach users can find a task inside rankingCoach to show them how to set up and link a Google Analytics account. They can also use rankingCoach’s keywords ranking tools to find which terms are ranking for which pages.

When creating future content it’s okay to create multiple articles chasing similar phrases but make sure that this content is doing it in a different way. For example, an article talking about a BBQ website's top five BBQ sauces to eat with pork, isn’t going to be competing for the same traffic as an article about ‘the top 5 BBQ sauce recipes’.

Duplicate Content from Site Coding Issues (hreflang)

It is not unusual for a website to have multiple versions of itself in the same language that are adapted for different territories. For example, a website could have one version with UK English and another for US spelling. This is a great way of providing a personal touch for customers in these different territories.

These additional versions need to be set up correctly or Google may wrongly interpret these pages as duplicate content. This can mean that both versions won’t rank well or that the wrong version of an article will be shown to searchers in different countries, which can also limit the additional ranking potential from nativized spelling and adapted content.

Why This Duplicate Content Is Created

This normally happens because the multiple versions of a page for the different territory don’t refer to one another. Understanding it simply, a page should be set up in a way that when GoogleBots find it, the page says to them:

‘How do you do? I'm the UK version of this article, show me to visitors from the UK, the Republic of Ireland, India ETC. This other page over here isn’t a content thief. He is my US brother, show him to searchers in the United States.

Avoiding Duplicate of Localized Pages

There are a variety of ways to tell Google Bots about multiple regional versions of your pages. All of them use a corresponding language codes, known as href lang tags, show Google what language the page is using and where it is from. For instance an English site for an American audience has the code EN-US whereas UK English uses EN-GB.

Google recommends using at least one of these three ways of displaying these codes (href lang tags) and avoiding duplicate content confusion:

Placing href language tags in the HTML of your site for each page

Putting href language tags along with the Url into the HTTP Headers of pages, this method is normally used for non-HTML files.
Submitting a Sitemap to Google that includes the language codes for each page

For more info on how to display these codes, see Google’s FAQ on localised versions of pages

We hope this article has helped you develop some good strategies for dealing with Duplicate Content and that you don’t see double again!