/
How do I prevent pages from being crawled?

This Confluence instance is now read-only, please head over to the Algolia Confluence instance for the same more up-to-date information

How do I prevent pages from being crawled?

Problem

I want to prevent certain pages from being indexed and appearing in search results.

Solution

You can add an HTML element of data-sj-noindex anywhere in a page and it will not be indexed. Most commonly this will be defined in the <head> of an HTML page as follows:

  1. Locate the <head> tag of the page you want to prevent from being crawled.

  2. Add the following code within the <head>:
    <meta name="robots" content="noindex" data-sj-noindex />

  3. Save the changes. The crawler will ignore this page next time it comes across it.

Additionally you can use crawling rules to programmatically exclude sections or certain pages of your website.


Related articles

Documentation

Advanced crawler documentation

 

 

Related content

How to use PosNeg tracking with the Search.io API
How to use PosNeg tracking with the Search.io API
Read with this
What customizations can be applied to the crawler?
What customizations can be applied to the crawler?
More like this
How to setup instant indexing <test>
How to setup instant indexing <test>
Read with this
How do I exclude a directory from search results?
How do I exclude a directory from search results?
More like this
How to use a search query to optimise category page filter performance
How to use a search query to optimise category page filter performance
Read with this
How to index a sitemap?
How to index a sitemap?
More like this