Search engine building tutorial, that supports advanced search syntaxes
It should support AND, OR, NOT; and perhaps brackets and wildcards; as well as forgiving for plural
Let’s start with what “advanced” mean
- It should support
NOT; and perhaps brackets
- Another part of it, though, is about optimization and fuzzy searches.
- Fast even for a large body of text.
- Realizes pluralization.
- Forgiving of minor typos.
Advanced search syntaxes
I have thought about this a lot in the past.
- Letting users search the database with a simple one-liner string (and let user decide which field to search)
- What features would you want for a
qQuerystring parser? (e.g. full-text-search, or more?)
The easiest way is to use lunr.js's syntaxes.
- Default connector is
- To make an
- Search is normally case-insensitive, i.e.
Ameans the same thing.
+expressionmeans exactly match, and case-sensitive.
- Not only
:, but also
<is used to specify comparison. For example,
- Date comparison is enabled.
- Special keyword:
+1hmeans next 1 hour.
-1hmean 1 hour ago.
- Available units are
You can see my experiment and playground here.
Full text search and fuzzy search
- Elasticsearch, Lucene, Solr
- Google custom search
How does it compare to search engines with web crawlers?
- lunr, elasticlunr
RDBMS and NoSQL's feature?
- SQLite FTS4, FTS5
- PostgreSQL plugin
Or, some other implementations, like Python's Whoosh.
Implementing both together
It is easier if you use RDBMS and NoSQL's features. PostgreSQL, MySQL and MongoDB (but not SQLite) allows you to create an index on a TEXT column, and make a full-text index.
Furthermore, PostgreSQL also has pgroonga, that does not only have more language support than native tsvector; but also can index anything, including
Now comes the algorithm for the syntax. I made it for PostgreSQL in another project.
polv - programming and medicine. Interested in Japanese and Chinese languages