Index merging¶
Merging two existing indexes can be more efficient than indexing the
data from scratch, and desired in some cases (such as merging ‘main’ and
‘delta’ indexes instead of simply reindexing ‘main’ in ‘main+delta’
partitioning scheme). So indexer
has an option to do that. Merging
the indexes is normally faster than reindexing but still not instant
on huge indexes. Basically, it will need to read the contents of both
indexes once and write the result once. Merging 100 GB and 1 GB index,
for example, will result in 202 GB of IO (but that’s still likely less
than the indexing from scratch requires).
The basic command syntax is as follows:
indexer --merge DSTINDEX SRCINDEX [--rotate]
Only the DSTINDEX index will be affected: the contents of SRCINDEX will
be merged into it. --rotate
switch will be required if DSTINDEX is
already being served by searchd
. The initially devised usage pattern
is to merge a smaller update from SRCINDEX into DSTINDEX. Thus, when
merging the attributes, values from SRCINDEX will win if duplicate
document IDs are encountered. Note, however, that the “old” keywords
will not be automatically removed in such cases. For example, if
there’s a keyword “old” associated with document 123 in DSTINDEX, and a
keyword “new” associated with it in SRCINDEX, document 123 will be found
by both keywords after the merge. You can supply an explicit condition
to remove documents from DSTINDEX to mitigate that; the relevant switch
is --merge-dst-range
:
indexer --merge main delta --merge-dst-range deleted 0 0
This switch lets you apply filters to the destination index along with merging. There can be several filters; all of their conditions must be met in order to include the document in the resulting merged index. In the example above, the filter passes only those records where ‘deleted’ is 0, eliminating all records that were flagged as deleted (for instance, using UpdateAttributes() call).