This Confluence instance is now read-only, please head over to the Algolia Confluence instance for the same more up-to-date information
What HTML elements does Search.io crawl?
Use this list to reference the specific HTML elements that Search.io crawls to index your data
URL (
url
). The full URL of the pageDir1, Dir2 (
dir1
,dir2
). Taken from the URL of the page.Canonical (
canonical
). The canonical URL for the page. Taken from<link rel="canonical">
(https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls#rel-canonical-link-method ).Title (
title
). The title of the page taken from<title></title>
(https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title )Image (
image
). The open graph image tag for the page taken from<meta property="og:image">
(https://ogp.me/ )Language (
lang
). Language of the page content (en
,fr
,de
, ...) taken from<html lang="">
(https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/lang )Description (
description
). The meta description of the page taken from<meta name="description">
(https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta/name )Keywords (
keywords
). List of keywords for the page taken from<meta name="keywords" />
(https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta/name )Modified Time (
modified_time
). The time when the page was last modified. Taken from:<meta property="article:modified_time">
or schema.orgmodifiedTime
: https://schema.org/modifiedTimePublished Time (
published_time
). The time when the page was first published. Taken from:<meta property="article:published_time">
or http://schema.orgdatePublished
: https://schema.org/datePublishedHeadings (
headings
). List of headings (<h1>-<h6>)from the body of the page (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Heading_Elements )