Duplicate content study
There has been a lot of talk about duplicate content. Duplicate content is not all the bad but if the majority of the content on your website falls under duplicate content then it is a problem.
Then why is duplicate content a concern?
This is a very important question to ask. Let’s say
- you have created a great content on page A.html and
- Your system (like cms, forum etc) created another page A1.html (a simpler version of the page)
- Someone just copied your content and put it on their website XYZ/Z.html. (Even it happen if you are providing RSS syndication)
Case A
- Page A gets 20 points (through links and other onpage factor for keyword A.)
- Page B gets 10 points
- Page XYZ/Z.html gets 2 points.
In this case Page A will (might) rank and you are ok with it.
Case B
- Page A gets 20 points
- Page B gets 30 points (as someone mentioned it on a social media and few more people linked to page B)
- Page XYZ/Z.html gets 2 points.
In this case Page B will rank, which is not ok for you as you haven’t optimize the template for conversion and other things. You always wanted Page A to rank for the keyword A, not page B.
Case C:(This is worst of all.)
- Page A gets 20 points
- Page B gets 10 points
- Page XYZ/Z.html gets 25 points (as it belonged to a better site and few more people linked to this page)
In this case Page XYZ/Z.html, WHAT??? You created the content and someone else is ranking for it. Yes it happens quite often. In this case if Page A and Page B of your site can combine the strength, it can win over the XYZ/Z.html page.
Important points of Duplicate content
- It is very difficult for any Search engine to detect the original source of content unless we take right steps.
- Duplicate content is not all the bad unless your site is full of duplicate content.
- Search Engines like Google doesn’t want to show the same web copy to the users, so they will use of the web page for a particular content.
Steps to avoid Duplicate content
- Show only one page per content – Either redirect it using 301 or use new canonical tag. In this way you will be able to combine all your strength under one page.
- Make sure you do a check across the web for duplicate content using copyscape.com and send legal notices to people copying you. Never link to these copied pages from your website. Also check who all are linking to them and complain to them saying this guy copied us and the original content is available at your site. In this way you will get more points and get your page ranking.
- Be aware of syndication, make sure that all the syndicated pages link back to your original content.
- Use sitemap effectively.
- You need to use our advance system. I will give you a hint that what we do, we keep adding thousands of keywords to our system that we discover and keep assigning the page that we want it to rank for. In case of any mismatch, it generates a message in our system and we can then look at the pages. We do a lot of advance auto checking
.
Good pages on Duplicate content
- Google Blog – Specify your canonical – February 12, 2009
- More webmaster questions – Answered! – September 23, 2008
- Demystifying the “duplicate content penalty” – September 12, 2008
- Duplicate content due to scrapers – June 09, 2008
- New: Content analysis and Sitemap details, plus more languages – December 13, 2007
- Google, duplicate content caused by URL parameters, and you – September 12, 2007
- Google blog – Duplicate content summit at SMX Advanced – June 13, 2007
- Google blog – Deftly dealing with duplicate content – December 18, 2006
Patents of Duplicate content
- Methods and apparatus for estimating similarity
- Detecting query-specific duplicate documents
- | Detecting duplicate and near-duplicate files
Notes about duplicate content
- Assignment of the hashing vectors is an important factor.