Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Sitemap Sitemaps can be helpful in pointing our crawler to webpages that are not internally linked within your website. When you add a new domain, we look for a sitemap at the root (e.g. www.website.com/sitemap.xml) of the domain and index all the links on the sitemap.

This guide explains how you can index a sitemap for your website collection.

...

  1. Check if the sitemap exists on your website by going to www.yourwebsite.com/sitemap.xml. If a sitemap is already present, skip to 3.

  2. If the sitemap is missing, then ask your website developer or technical team to add a sitemap to your website. For more information on standard XML sitemaps please see:

    1. https://www.sitemaps.org/protocol.html

    2. Example: www.website.

    Sitemap must be present at the root of domain and should be named
    1. com/sitemap.xml

  3. The name of the sitemap must end with "sitemap.xml" in order to be crawled (i.e. It can be located under any directory (e.g. www.yourwebsite.com/site/first-sitemap.xml).

  4. Log in to your console and select the relevant collection.

  5. Navigate to Domains section and click on "Diagnose".Enter the Crawler crawl statuses section.

  6. In the textbox underneath the page heading, enter the URL of the sitemap, ie.eg. www.yourwebsite.com/sitemap.xml as the URL and press "Diagnose".
    It would return a message "Page not found in the index." Press "Add to IndexA modal will pop up showing that the URL has not been crawled.

  7. Click "Crawl page".

The sitemap will be indexed and it pages will start being added to your collection. It might take a few minutes or a few hours depending on the amount number of pages on your website and load the number of pages waiting in our index queues.

If you click on "See extended debug informationOpen in page debugger", you might see a MIME error on in the Page Debug tool. This error can be ignored, and your sitemap and all the links on your sitemap will be indexed.

...

Filter by label (Content by label)
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@957showSpacefalse
sorttitle
typeshowSpacepagefalse
reversetrue
labelstypecrawlerpage
cqllabel = "crawler" and type = "page" and space = "KB"
labelscrawler