Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

How does crawling and indexing work?

Answer

Without our ping-back code installed, indexing is done by following links starting with the root of the domain.

For instance, for sajari.com the crawler starts with http://www.sajari.com and follows links from there. Here's some more information on how Sajari works with domains.

With the Our crawler visits the webpages of the domains you add to your collection. Read “How the crawler works” for more information.

Once the collection is indexed, all the webpages in the index are then re-visited by the crawler periodically between 3-7 days.

If you have our Instant Indexing ping-back code installed on your website, any new or updated content webpage is updated in the collection as soon as the page is visited rather than waiting for a periodic crawl cycle, which might take 3-7 days. Without our ping-back code installed, indexing is done by following links starting with the root of the domainwhen any of the following fields are updated and the page is visited within 30 minutes of being updated.

  1. title

  2. description

  3. canonical

  4. robots

  5. og:title

  6. og:image

  7. og:description

Instant indexing does not remove records from the index if a webpage’s status code is changed to a 404, 403, 301, or a 302. However, the regular crawl cycle does take the status code into account and will remove the page from the index if the page returns a 404, 403, 301, or a 302.