Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Most websites include metadata for recording a date when a publication (such as a blog, article, or a report) is published or updated. You can index this metadata in your collection to display this information in the search results, use it for sorting, or for filtering.

This how-to-guide details standard metadata fields the search.io crawler detects and how you can index date metadata in your collection.

...

A website page might have multiple date metadata fields on it, for example, datePublished, modifiedDate, & lastupdated, etc. The crawler indexes date metadata automatically if the best practices are followed and the correct meta fields are used.

Below are a few examples of the standard meta-fields that our crawler detects and indexes automatically:

Open Graph Protocol

Property

Expected Type

Description

article: published_time

DateTime

When the article was first published.

article: modified_time

DateTime

When the article was last changed.

Note: If you do not provide a datetime with a timezone we will parse it as UTC.

Schema.org

Property

Expected Type

Description

dateCreated

Date or DateTime

The date on which the CreativeWork was created or the item was added to a DataFeed.

dateModified

Date or DateTime

The date on which the CreativeWork was most recently modified or when the item's entry was modified within a DataFeed.

datePublished

Date

Date of first broadcast/publication.

Info

We also support schema.org entities in JSON-LD format for a few of our enterprise clients. We are working on supporting JSON-LD format by default. If you are a current customer looking to use JSON-LD format, get in touch via our Service Desk.

Custom Metadata

You can also add a custom metadata date field to your website and use that. Read more on how you can index custom fields in your collection.

...

How to index date metadata

Instructions

  1. Identifying Identify the metadata field that you want to use for sorting resultsindex in your collection.

  2. If the field are is using Open Graph Protocol or Schema.org entities, then skip to step 4.

  3. Add a schema field for the date metadata field via the Schema section of the console and add data-sj-field="fieldname" attribute to the metadata (see detailed instructions).

  4. Re-index a sample page via the "Diagnose" tool in the Domains Crawler section. Once indexed, the record should be updated and the metadata should be added to the field.

  5. Check that the record have has been indexed and has the correct field value. You can check this via the Preview section. Use "Expand all" to display all fields and use filter (e.g. "filter":"url='http://www.url.com'") to check a specific page.

  6. Once verified that the metadata is being indexed correctly, re-index crawl all domains pages in the Domains Crawler section.

You can also use the Page Debug tool to check if the crawler detects the date metadata that you want to add.

...

After successfully indexing your date metadata, you can use the indexed date field to:

...

Documentation

Sorting

Filter records based on a timestamp field

Filtering content

Filter by label (Content by label)
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@957showSpacefalse
sorttitle
typeshowSpacepagefalse
reversetrue
labelstypecrawlerpage
cqllabel in ( "integration" , "crawler" ) and type = "page" and space = "KB"
labelscrawler