Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I want to prevent certain pages from being indexed and appear appearing in search results.

Solution

You can add can add an HTML element of data-sj-noindex anywhere in a page and it will not be indexed. Most commonly this will be defined in the header <head> of an HTML page as follows:

  1. Locate the <header> <head> tag of the page you want to prevent from being crawled.

  2. Add the following code within the header <head>:
    <meta name="robots" content="noindex" data-sj-noindex />

  3. Save the changes. The crawler will ignore this page next time it comes across it.

https://www.sajari.com/docs/user-guide/indexing-data/advanced-crawlerAdditionally you can use crawling rules to programmatically exclude sections or certain pages of your website.

...

Filter by label (Content by label)
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@957
sorttitle
showSpacefalse

...

reverse

...

true
typepage

...

...

cqllabel = "crawler" and type = "page" and space = "KB"
labelscrawler

...

hiddentrue

...

Documentation

Advanced crawler documentation