Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Now let’s see how it’s done. Here are the steps we’re taking.

Table of Contents

Getting started 

There are few initial steps involved to get Sajari search off the ground:

...

We will walk through each of these below.

Defining a schema

Schemas are all about performance. If you aren’t convinced we wrote an article on schema vs schemaless just to convince you 😉 

...

Note: If you’re using our website crawler, Shopify, or another connector this will happen automatically.

Configure pipelines

The configuration of an intelligent search algorithm can be extremely complicated. So, we re-imagined how engineers can build search by creating pipelines. Pipelines break down search configuration into smaller pieces that can be easily mixed, matched, and combined to create an incredibly powerful search experience. Pipelines are highly composable and extendable. You can read a quick pipeline overview here and more specific details related to this example below.

...

There is much more you can do with pipelines, which is somewhat expanded on below, but it’s worth noting you also don’t need much to get started.

Query pipelines

Query pipelines define the query execution and results ranking strategies used when searching the records in your collection. 

...

Let’s walk through just a few of the customizations you can add to search using query pipelines. Below are some explanations of step configurations used in this example. 

Field weights

Field weights allow different fields to be worth more when searching. For example the product name is probably a more valuable match than description. 

...

Code Block
# Index lookups
id: set-score-mode
  params:
    mode:
      const: MAX
id: index-text-index-boost
  params:
    field:
      const: title
    score:
      const: "1"
    text:
      bind: q
id: index-text-index-boost
  params:
    field:
      const: body_html
    score:
      const: "0.06125"
    text:
      bind: q
id: index-text-index-boost
  params:
    field:
      const: vendor
    score:
      const: 0.5
    text:
      bind: q
id: index-text-index-boost
  params:
    field:
      const: product_type
    score:
      const: 0.25
    text:
      bind: q
id: index-text-index-boost
  params:
    field:
      const: tags
    score:
      const: 0.125
    text:
      bind: q
id: index-text-index-boost
  params:
    field:
      const: image_tags
    score:
      const: 0.06125
    text:
      bind: q
id: index-text-index-boost
  params:
    field:
      const: variant_titles
    score:
      const: 0.06125
    text:
      bind: q

Spelling correction

Spelling is hard! If you’ve tried a fuzzy match in other search engines you would know it’s underwhelming. It also typically slows things down a lot! 

...

The nice thing about Sajari spelling is that it also learns over time. As people execute more queries the suggestions and autocomplete begin to better understand the most likely suggestions your customers want. 

Business boosts

Next part is looking at business metrics and how they impact ranking. You may want to promote items that sell more frequently, have inventory in stock, have higher user ratings, or anything else you can think of! 

...

Now let’s see  how more targeted logic can be applied to specific queries.

Gmail-style filters

Want to add Gmail-style filters to your search bar? For power users these are amazing. See below where the in:cheap syntax is used to filter to lower priced items. This example is contrived, but shows how easy this is to do with pipelines:

...

  • in is just an example. You could look for has, from or anything you like.

  • Many additional filters can be added for all the other potential values here. It could be in:sale or anything else.

Natural language parsing

We have two ways to do this, a) basic pattern matches and b) more complex models. This example uses the first option to show how some basic capabilities are easily addressed:

...

Note here that the first step transformed the initial query and the second step conditionally activates the filter when appropriate. This is interesting as the price input could actually be sent in as part of the initial query (i.e. not extracted). This is a common use case for personalisation and recommendations where the gender, size or preferences may be known. 

Product segmentation

If someone searches for “tv” they probably want a tv and not a tv aerial, or a tv cabinet, or something as “seen on tv.” I say “probably” because search and discovery is all about ambiguity. This alignment of queries to some form of segmentation is very valuable. Below shows an example of a query for “tv” both before and after applying a conditional boost to make this alignment.

...


So the step we used previously to boost the tv category would actually now be basically unnecessary, as the performance would create this relationship for all queries with any detected statistically significant relationships. It’s less specific than a hand written rule, but it’s far more accurate than any one person can determine! 

Record pipelines

The record pipeline can update and augment information as it is indexed. 

...

For this example build we’re looking at the last example to do with image analysis for visual search which is explained further below. 

Visual search

You may have noticed the color palette in the facet and filters menu. This was generated using the Google Cloud Vision API, which is powered by advanced AI image analysis. The color information and image descriptions are AI generated and were not in the original data set. 

The image analysis was done as each product was loaded. Uniquely this was done by a pipeline step calling out to a cloud function. Why is that important? Because it shows you can augment product processing with anything you like. Note that these features are available even if you’re using a third-party service such as Shopify to host your store. 

Sync product data

There are a couple different ways to sync your data with Sajari:

...

We have had people in the past comment that the updates were too fast to be consistent updates, but they indeed are! We achieve this via upserts to create and sync records The record wasn’t added to a buffer to merge later; instead, the differential was calculated and executed within the request-response sequence. This makes any changes instantly available as soon as they are synchronized. If you’re using ElasticSearch or one of the Lucene variants you will love this feature.

Generating a UI

So far we have ingested and categorized data intelligently, and built smart query pipelines with deep functionality. Now we need to add it to our site and make it shine for a great user experience.

...

In the above example the filter is active, but the other facet counts are still displayed, even though the items are not in the result set. This allows the UI to show the user what else is available, even if not selected. This is optional though.

Conclusion

In less time than it took to write this article we’ve shown how to build a blazing fast e-commerce search and discovery interface with autosuggest, sorting, some basic NLP, machine learning powered automatic result improvement, automated categorical segment alignment, AI generated visual image search, spell correction, gmail style filters, conditional business logic and much more!

...

Documentation

Pipelines Overview

Filter by label (Content by label)
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@957
showSpacefalse
sorttitle
typepage
reversetrue
labelscrawler
cqllabel in ( "label" , "ecommerce" ) and type = "page" and space = "KB"labelscrawler