Indexes

To be able to answer full-text search queries fast, Manticore needs to build a special data structure optimized for such queries from your text data. This structure is called index; and the process of building index from text is called indexing.

An index identifier must be a single word, that can contain letters, numbers and underscores. It must start with a letter.

Different index types are well suited for different tasks. For example, a disk-based tree-based index would be easy to update (ie. insert new documents to existing index), but rather slow to search. Manticore architecture allows internally for different index types, or backends, to be implemented comparatively easily.

Manticore provides 2 different backends: a disk index backend, and a RT (realtime) index backend.

Offline/plain indexes

Disk indexes are designed to provide maximum indexing and searching speed, while keeping the RAM footprint as low as possible. That comes at a cost of text index updates. You can not update an existing document or incrementally add a new document to a disk index. You only can batch rebuild the entire disk index from scratch. (Note that you still can update document’s attributes on the fly, even with the disk indexes.)

This “rebuild only” limitation might look as a big constraint at a first glance. But in reality, it can very frequently be worked around rather easily by setting up multiple disk indexes, searching through them all, and only rebuilding the one with a fraction of the most recently changed data. See Live index updates for details.

Real-Time indexes

RT indexes enable you to implement dynamic updates and incremental additions to the full text index. RT stands for Real Time and they are indeed “soft realtime” in terms of writes, meaning that most index changes become available for searching as quick as 1 millisecond or less, but could occasionally stall for seconds. (Searches will still work even during that occasional writing stall.) Refer to Real-time indexes for details.

Distributed indexes

Manticore supports so-called distributed indexes. Compared to disk and RT indexes, those are not a real physical backend, but rather just lists of either local or remote indexes that can be searched transparently to the application, with Manticore doing all the chores of sending search requests to remote machines in the cluster, aggregating the result sets, retrying the failed requests, and even doing some load balancing. See Distributed searching for a discussion of distributed indexes.

Templates indexes

Template indexes are indexes with no storage backend. They can be used operations that involve only data from input, like keywords and snippets generation.

Percolate indexes

Percolate indexes are special Real-Time indexes that store queries instead of documents. They are used for prospective searches ( or “search in reverse”). Refer to Percolate query for more details.

There can be as many indexes per configuration file as necessary. indexer utility can reindex either all of them (if --all option is specified), or a certain explicitly specified subset. searchd utility will serve all the specified indexes, and the clients can specify what indexes to search in run time.

Index files

Each index consists of a number of files.

Plain indexes and RealTime indexes chunks:

Extension Stores Memory management
spa scalar attrs see ondisk_attrs
spd document lists on disk, gets cached by OS
spi dictionary always loaded in memory
sph index/chunk header always loaded in memory
spk Kill list always loaded in memory
spl index lock file on disk only
spm MVA attrs see ondisk_attrs
spp keyword positions on disk, gets cached by OS
sps string/json attrs see ondisk_attrs
mvp MVA attrs updates [1] always loaded in memory

[1] - created only in case MVA persistent updates

RealTime indexes also have:

Extension Stores Memory management
kill RT kill [1] on disk only
meta RT header always loaded in memory
lock RT lock file on disk only
ram RAM chunk copy [2] on disk only

[1] RT kill - documents that gets REPLACEd. Gets cleared when RAM chunk is dumped as disk chunk.

[2] RAM chunk copy - created when RAM chunk is flushed to disk. Cleared when RAM chunk is dumped as disk chunk.