How does crawling and indexing work?
Answer
Without our ping-back code installed, indexing is done by following links starting with the root of the domain.
For instance, for sajari.com the crawler starts with http://www.sajari.com
and follows links from there. Here's some more information on how Sajari works with domains.
With the Our crawler visits the webpages of the domains you add to your collection. Read “How the crawler works” for more information.
Once the collection is indexed, all the webpages in the index are then re-visited by the crawler periodically between 3-7 days.
If you have our Instant Indexing ping-back code installed on your website, any new or updated content webpage is updated in the collection as soon as the page is visited rather than waiting for a periodic crawl cycle, which might take 3-7 days. Without our ping-back code installed, indexing is done by following links starting with the root of the domainwhen:
any of the mentioned page meta field's are updated (title, description, canonical value, robots), and
the page is visited within 30 minutes of being updated.
Instant indexing does not remove records from the index if a webpage’s status code is changed to a 404, 403, 301, or a 302. However, the regular crawl cycle does take the status code into account and will remove the page from the index if the page returns a 404, 403, 301, or a 302.