This Confluence instance is now read-only, please head over to the Algolia Confluence instance for the same more up-to-date information

How to index a sitemap?

Sitemaps can be helpful in pointing our crawler to webpages that are not internally linked within your website. When you add a new domain, we look for a sitemap at the root (e.g. www.website.com/sitemap.xml) of the domain and index all the links on the sitemap.

This guide explains how you can index a sitemap for your website collection.

Instructions

Follow these steps to add your sitemap to your collection:

  1. Check if the sitemap exists on your website by going to www.yourwebsite.com/sitemap.xml. If a sitemap is already present, skip to 3.

  2. If the sitemap is missing, then ask your website developer or technical team to add a sitemap to your website. For more information on standard XML sitemaps please see:

    1. https://www.sitemaps.org/protocol.html

    2. Example: www.website.com/sitemap.xml

  3. The name of the sitemap must end with "sitemap.xml" in order to be crawled. It can be located under any directory (e.g. www.yourwebsite.com/site/first-sitemap.xml)

  4. Log in to your console and select the relevant collection.

  5. Navigate to the Crawler crawl statuses section.

  6. In the textbox underneath the page heading, enter the URL of the sitemap, e.g. www.yourwebsite.com/sitemap.xml as the URL and press "Diagnose". A modal will pop up showing that the URL has not been crawled.

  7. Click "Crawl page".

The sitemap will be indexed and pages will start being added to your collection. It might take a few minutes or a few hours depending on the number of pages on your website and the number of pages waiting in index queues.

If you click on "Open in page debugger", you might see a MIME error in the Page Debug tool. This error can be ignored, and your sitemap and all the links on your sitemap will be indexed.

 

Related articles