Nearly every site has an internal site search engine that searches only that site, and itโs not usually a part of your list of security risks. Theyโre also typically non-entities for search engine optimization. After all, search engines generally donโt use search bars. There is, however, a way for spammers to hijack your internal site search to create spam.
A Real-Life Example of Internal Site Search Spam
Hereโs how internal site search spam works: A spammer uses your internal site search URL string to โcreateโ low-value pages that contain keywords and URLs. Then, they create a third-party page that links to those low-value internal site search URLs on your site and use Google Search Console to request indexation of the page. That prompts Google to crawl from the third-party page to your low-value internal site URLs, getting them discovered and potentially indexed.
Too confusing? Hereโs an example.
One of our clients, weโll call them Client-A, had an internal site search URL string that looked like this: https://www.client-a.com/search?q=search-term. The spam was generated using this scenario:
- Spammer-B identifies the internal site search results page URL on https://www.client-a.com.
- Spammer-B creates a page on its own spam site, for example, https://www.spammer-b.com/some-crummy-page.
- That page (https://www.spammer-b.com/some-crummy-page) links to a bunch of Client-Aโs internal site search URLs. For example, perhaps they do a search for the query โfree viagra www.spam-site-c.comโ in the internal site search, and it generates a zero-results internal search results page at this URL: https://www.client-a.com/search?q=free-viagra-www.spam-site-c.com. The spammers grab that URL and create a link to it on https://www.spammer-b.com/some-crummy-page.
- Spammer-B requests that Google index https://www.spammer-b.com/some-crummy-page in Google Search Console.
- Google crawls https://www.spammer-b.com/some-crummy-page and discovers the links to Client-Aโs internal site search pages, like https://www.client-a.com/search?q=viagra-free-viagra-www.spam-site-c.com.
Why would spammers do this? The theory is that they are creating mentions of a URL and a keyword on a domain that has authority, some of which would then transfer to Spammer-Bโs domain.
Itโs incredibly unlikely that this would actually result in a transfer of value to the spammerโs site, but that doesnโt stop them from trying. What it does do is create a whole mess of low-value, zero-result internal site search URLs for Google to crawl, wasting your crawl equity.
To see if you have this problem today, look for your internal site search URLs in your Google Search Console Pages โDiscovered – currently not indexedโ report.
How to Prevent Internal Site Search Spam
Better yet, before internal site search spam becomes an issue, take evasive action.
There are two ways to prevent internal site search spam from taking hold on your site. You can choose to block internal site search results from being indexed or from being crawled.
Block Indexing
Of the two choices, blocking indexation is the better option because it ensures that these internal site search results pages wonโt be indexed by Google and wonโt appear in search results. Simply use a meta robots tag in the head of the page with a noindex attribute to effectively prevent indexation of the page. The line of code looks like this:
<meta name=โrobotsโ content=โnoindexโ>
Block Crawling
Unlike using a meta robots noindex tag, choosing to block crawling using a disallow command in the robots.txt file doesnโt prevent Google from indexing internal site search results pages. It only requests that bots not crawl the pages indicated.
Itโs important to note, however, that if Google has already associated internal site search spam with your site, just blocking crawling wonโt stop it from being indexed. You need to first noindex the affected pages. After they have been deindexed, then you can disallow them with the robots.txt file to save crawl budget.
Preventing internal site search spam is a simple yet crucial step in SEO and site security. It prevents bad actors from manipulating your internal site search results for nefarious purposes and protects your crawl budget from being wasted in the process. Take the time to check your Google Search Console reporting today for evidence of this type of spam and, if found, take steps to remove it today.