Boolean query syntax¶
Boolean queries allow the following special operators to be used:
- operator OR:
hello | world
- operator NOT:
hello -world
hello !world
- grouping:
( hello world )
Here’s an example query which uses all these operators:
Example 5.1. Boolean query example
( cat -dog ) | ( cat -mouse)
There always is implicit AND operator, so “hello world” query actually means “hello & world”.
OR operator precedence is higher than AND, so “looking for cat | dog | mouse” means “looking for ( cat | dog | mouse )” and not “(looking for cat) | dog | mouse”.
Queries may be automatically optimized if OPTION boolean_simplify=1 is specified. Some transformations performed by this optimization include:
- Excess brackets: ((A | B) | C) becomes ( A | B | C ); ((A B) C) becomes ( A B C )
- Excess AND NOT: ((A !N1) !N2) becomes (A !(N1 | N2))
- Common NOT: ((A !N) | (B !N)) becomes ((A|B) !N)
- Common Compound NOT: ((A !(N AA)) | (B !(N BB))) becomes (((A|B) !N) | (A !AA) | (B !BB)) if the cost of evaluating N is greater than the added together costs of evaluating A and B
- Common subterm: ((A (N | AA)) | (B (N | BB))) becomes (((A|B) N) | (A AA) | (B BB)) if the cost of evaluating N is greater than the added together costs of evaluating A and B
- Common keywords: (A | “A B”~N) becomes A; (“A B” | “A B C”) becomes “A B”; (“A B”~N | “A B C”~N) becomes (“A B”~N)
- Common phrase: (“X A B” | “Y A B”) becomes ((“X|Y”) “A B”)
- Common AND NOT: ((A !X) | (A !Y) | (A !Z)) becomes (A !(X Y Z))
- Common OR NOT: ((A !(N | N1)) | (B !(N | N2))) becomes (( (A !N1) | (B !N2) ) !N)
Note that optimizing the queries consumes CPU time, so for simple queries -or for hand-optimized queries- you’ll do better with the default boolean_simplify=0 value. Simplifications are often better for complex queries, or algorithmically generated queries.
Queries like “-dog”, which implicitly include all documents from the collection, can not be evaluated. This is both for technical and performance reasons. Technically, Manticore does not always keep a list of all IDs. Performance-wise, when the collection is huge (ie. 10-100M documents), evaluating such queries could take very long.