Duplicate content can impact your rankings negatively, but not in the way you might think. Can I use content from other sources? Is there a duplicate content penalty? Read on for the answer to these questions and more.
What is Duplicate Content?
Duplicate content occurs when the exact, or similar, content is used across multiple pages on a single site or across multiple websites.
Most people think of blocks of content copied from one page to another or one site to another as duplicate content. This is the most obvious form, yes. But it doesn’t have to be an exact one-to-one copy for search engines to decide the content is too similar. In addition, “content” doesn’t have to refer to the words displayed on the page. Metadata — such as title tags and meta descriptions — can be considered duplicate content as well.
How Does Duplicate Content Impact Rankings?
The question on everyone’s mind is, “Will I get penalized for duplicate content?” The answer is a bit more complicated than a simple yes or no. In most cases, Google will not directly penalize you for duplicate content. In fact, as far back as 2008, Google stated clearly that there was no such thing as a duplicate content penalty.
Even if there isn’t a direct penalty for duplicate content, it can still impact your rankings in several ways.
Link Equity
Pages that link to each other pass link equity — a signal of importance — through the links to help search engines understand which pages a site thinks should rank most strongly. The more pages that link to a specific page, both internally within the site and externally from other sites, the more important search engines perceive that page to be.
Duplicate content waters down these signals by spreading the link equity across multiple versions of a page of content, resulting in each duplicate page receiving less. Consequently, pages with duplicate content rarely have the visibility they need to rank strongly.
Internal Competition
When you have multiple copies of a single page of content, it can be difficult for search engines to determine which version to index and rank. The pages end up competing with each other instead of having a single, strong page that’s more able to win rankings for the site. There’s enough external competition for rankings; don’t compete against yourself as well.
Crawl Budget
Duplicate content can also impact your crawl budget, the number of pages a search engine will crawl within a given time period. Search engine crawlers have to both discover new content, and recrawl pages they already know exist, during each visit to a site. If they’re wasting time crawling duplicate content, they’re not spending as much time as they could discovering new or refreshed content.
Why Does Duplicate Content Exist?
Duplicate content can be created intentionally, but many times it flies completely under the radar. In fact, the technology that drives your website is probably quietly generating duplicate content as you read this. Knowing what causes duplicate content is essential to minimizing its impact.
Copied Content
First and foremost, if you’re using duplicate content for SEO purposes intentionally, stop. Search engines are pretty good at determining the original source of a piece of content. It’s unlikely that you’ll get SEO “credit” for posting copied content, and it can even hurt you if all of the content on your site is sourced from other sites. If you are currently ranking with content from other sites, consider it a short-term situation that could end at any time.
Syndicated Content
Even in instances where a site uses copied content legally — as in a syndication arrangement — the content is usually either canonicalized to the originating site or noindexed on the receiving site. The syndication partner typically maintains the canonical or original version of the content for ranking purposes.
URL Structure
The structure of your site’s URLs may also be causing duplicate content issues. Possible culprits include:
- Subdomain: www vs. non-www (https://www.example.com vs. https://example.com);
- Character casing: Mixed case URLs (https://example.com/SEE vs. https://example.com/see);
- Backslash: Trailing vs. non-trailing backslash (https://example.com/see/ vs. https://example.com/see);
- Site functions: For example, printer-friendly pages (https://example.com/see/print vs. https://example.com/see).
Ecommerce Filtering & Sorting Parameters
While filtering and sorting are critical for shoppers on ecommerce sites, they can generate duplicate content for search engines. For example, the content is essentially the same for a category page that shows 24 results and one that shows 48 results, or one that displays the results in alphabetical order vs. best selling.
International Content
When the default language for two or more countries is the same, as it is with English in the U.S., much of Canada, and the U.K., the content written for those countries can be incredibly similar or even the same except for currency changes or other small differences.
Ecommerce Product Descriptions
Sites that sell other companies’ products tend to use product descriptions written by those manufacturers to save time. However, when many sites do this, it sets up a situation where all of the product pages across many sites are sending essentially the same relevance signals. Rankings will largely be determined by other factors — like link authority — when that’s the case, which tends to disadvantage smaller players who don’t have the resources to custom-write new product descriptions.
Resolving Duplicate Content
There are several ways to remove duplicate content from search engines’ indexes, and a handful of others that don’t work the way people think they should. In most cases, you’ll want to use either a canonical tag or a 301 redirect to remove the duplicate content from your site. Which option you use depends on whether you need your site’s visitors to see that page of content (canonical tag) or not (301 redirect). Read more about the options in our post on “4 Ways To Remove Indexed Content & 3 That Don’t Work.”
While there is no hard and fast rule for how much duplicate content is acceptable on a site, minimizing it whenever possible remains essential. Google and other search engines might not penalize your site directly, but duplicate content can still negatively affect your rankings.