Full-Text Search in SingleStoreDB

RK

Russ Kochevar

Principal Solutions Engineer

Full-Text Search in SingleStoreDB

We’re demonstrating the power of full-text search capabilities  in SingleStoreDB — complete with code and examples for you to give it a try.

SingleStoreDB is a real-time, distributed SQL database that handles both transactional (OLTP) and analytical (OLAP) workloads in a single, unified database. SingleStoreDB is a multi-model database that natively allows you to store and retrieve several different data types including relational, document, time-series and geospatial, along with providing full-text search capabilities. 

SingleStoreDB has supported full-text search (FTS) since 2018. Full-text search refers to the ability to search text inside text data stored in a database table, returning results that contain some or all of the words from the query that was executed. This differs from a traditional query that returns only an exact match of the data. 

For traditional databases, full-text searches can be challenging — which often leads to the development of specialty databases being used strictly for this purpose.  SingleStoreDB helps customers simplify their data landscape by integrating full-text search capabilities into the core functionality of the database engine.

how-does-full-text-search-workHow Does Full-Text Search Work?

So, how do full-text searches in SingleStoreDB work?  As previously described, full-text search allows for the searching of words or phrases in a large body of text. For full-text searches to work, an index must be created on a text data type. In SingleStoreDB, text data types include char, varchar, text and long text.  Full-text indexes are only supported on columnstore tables, and they must be created as part of the CREATE TABLE process using the FULLTEXT index type. 

how-do-we-do-itHow Do We Do It?

Full-text search takes advantage of SingleStoreDB’s patented Universal Storage, a unified table type that supports both OLTP and OLAP workloads. The columnstore table is a perfect table type for full-text indexes because a columnstore table lends itself to very fast scans of individual — or indexed — columns. Columnstore tables are also ideal for great compression. In fact, it's not uncommon to see SingleStoreDB compression of up to 90% on a columnstore table.  

What will full-text search allow you to do in SingleStoreDB? You can search for an exact phrase or word, or you can complete “fuzzy” searches.  A fuzzy search is similar to a query with a “LIKE” filter in the where clause, and allows you to find words or phrases that are similar to the text you are searching for. 

examplesExamples

The following tutorial will walk you through an example of creating a full-text index and doing several searches.

Step 1 — Create a database, table and use SingleStore Pipelines to load some example data.

CREATE DATABASE fulltextdb;
USE fulltextdb;

CREATE TABLE jeopardy_questions (
  show_number
text,
  air_date
date,
  
round text,
  category
text,
  
value text,
  question
text,
  answer
text,
  
KEY (category) using clustered columnstore,
  
FULLTEXT (question, answer)
);

CREATE PIPELINE load_questions
AS LOAD DATA S3 's3://testdata.memsql.com/full-text/*'
CONFIG '{"region": "us-east-1"}'
SKIP PARSER ERRORS
INTO TABLE jeopardy_questions
FIELDS TERMINATED BY
','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY "\r\n"
IGNORE 1 LINES
(
  show_number,
  @air_date,
  
round,
  category,
  
value,
  question,
  answer
)
set air_date = to_date(@air_date,'YYYY-MM-DD');

START
PIPELINE load_questions;

OPTIMIZE TABLE jeopardy_questions flush;

Step 2 — Perform a full-text search using the function MATCH AGAINST.

Select question From jeopardy_questions
Where MATCH (question) AGAINST ('baseball') and category =
'SPORTS';

This example was a simple Select statement using MATCH AGAINST with one column.  Match can also be used in UPDATE and DELETE queries, and can be used on multiple column names.

SingleStoreDB also supports the use of various operators in the expression used in the MATCH AGAINST function. Here are a few examples of the operators that can be used.

+A leading plus sign (+) indicates the word that follows the plus sign must be present in each row returned.
-A leading minus sign (-) indicates the word that follows the minus sign must not be present in any of the rows returned.
*,?The asterisk and question mark operators act as wildcards.  A single wildcard character looks for terms with the single character replaced. Multiple wildcard characters look for zero or more characters.
~The tilde symbol is used to support fuzzy searches. To do a fuzzy search, use the tilde symbol at the end of a single-word term.

Step 3 — Perform a full-text search using the fuzzy search operator.

-- fuzzy search with the term singlestore

Select question from jeopardy_questions
Where MATCH (question) AGAINST ('singlestore~');

Doing a full-text fuzzy search on ‘singlestore’ returned rows that had words inlcuding: order, Singapore, singleton, Singleton’s, Silverstone and milestone, among others.

You can also use full-text search when doing real-time analytical queries in SingleStoreDB.  In our final example, we will do a full-text search with an aggregation query to find the top 10 dollar values for questions that contain the word ‘baseball’ — returning results in milliseconds.

-- Combine a aggregation and full-text search query to find the top
10 values of questions that contain the word baseball

Select value, count(*) "Questions" From jeopardy_questions
Where MATCH (question) AGAINST ('baseball')
Group by value
Order
by 2 Desc
Limit
10;

conclusionConclusion

SingleStoreDB full-text search makes it easy to build powerful search capabilities into your transactional and analytical applications on our highly performant HTAP database.  But don’t take my word for it —  give SingleStoreDB a try for free.


Share