Please note that this feature is in preview stage. Some functionality may be not yet complete and may suffer changes. Read carefully changelogs of future updates to avoid possible breakages.
Manticore search daemon can replicate a write transaction (
in an index to other nodes in the cluster. Currently only percolate indexes are supported, rt index support is in progress.
Only Linux packages and builds support replication, Windows and MacOS packages do not support replication.
We took advantage of Percona’s fork of Galera library which gives the following benefits:
- true multi-master - read and write to any node at any time
- synchronous replication - no slave lag, no data is lost after a node crash
- hot standby - no downtime during failover (since there is no failover)
- tightly coupled - all the nodes hold the same state. No diverged data between nodes allowed
- automatic node provisioning - no need to manually back up the database and restore it on a new node
- easy to use and deploy
- detection and automatic eviction of unreliable nodes
- certification based replication
To use replication in Manticore search:
- the daemon should be built with replication support (enabled in the builds Manticore provides)
- data_dir option should be set in searchd section of config
- there should be a listen for the replication protocol directive containing an external IP address that should not be equal to 0.0.0.0 along with a ports range defined and these “address - port range” pairs should be different for all the daemons on the same box. As a rule of thumb, port range should specify no less than two ports per cluster.
- there should be at least one value of listen for the SphinxAPI protocol directive
Replication cluster is a set of nodes among which a write transaction gets replicated.
Replication is configured on the per-index basis. One index can be assigned to only
one cluster. There is no restriction on how many indexes a cluster may have. All
transactions such as
TRUNCATE in any
percolate index belonging to a cluster are replicated to all the other nodes in the
cluster. Replication is multi-master, so writes to any particular node or to
multiple nodes simultaneously work equally well.
Replication cluster configuration options are:
Specifies a name for the cluster. Should be unique.
A list of address:port pairs for all the nodes in the cluster (comma separated). A node’s API interface should be used for this option. It can contain the current node’s address too. This list is used to join a node to the cluster and rejoin it after restart.
Creating a cluster¶
To create a cluster you should set at least its name. In case of a single cluster or if the cluster you are creating is the first one path option may be omitted, in this case data_dir option will be used as the cluster path. For all subsequent clusters you need to specify path and this path should be available. nodes option may be also set to enumerate all the nodes in the cluster.
CREATE CLUSTER posts CREATE CLUSTER click_query '/var/data/click_query/' as path CREATE CLUSTER click_query '/var/data/click_query/' as path, 'clicks_mirror1:9312,clicks_mirror2:9312,clicks_mirror3:9312' as nodes
If a cluster is created without the nodes option, the first node that gets joined to the cluster will be saved as nodes.
Joining a cluster¶
To join an existing cluster name and any working node should be set. In case of a single cluster path might be omitted, data_dir will be used as the cluster path. For all subsequent clusters path needs to be set and it should be available.
JOIN CLUSTER posts AT '10.12.1.35:9312'
A node joins a cluster by getting the data from the node provided and, if successful, it updates node lists in all the other cluster nodes similar to ALTER CLUSTER … UPDATE nodes. These lists are used to rejoin nodes to the cluster on restart.
There are two lists of nodes. One is used to rejoin nodes to the cluster on restart, it is updated across all nodes by
ALTER CLUSTER … UPDATE nodes.
JOIN CLUSTER ... AT does the same update automatically.
SHOW STATUS shows this list as
The second list is a list of all active nodes used for replication. This list doesn’t require manual management.
ALTER CLUSTER … UPDATE nodes actually copies this list of nodes to the list of nodes
used to rejoin on restart. SHOW STATUS shows this list as
When nodes are located at different network segments or in different datacenters nodes option may be set explicitly. That allows to minimize traffic between nodes and to use gateway nodes for datacenters intercommunication. The following command joins an existing cluster using the nodes option.
JOIN CLUSTER click_query 'clicks_mirror1:9312;clicks_mirror2:9312;clicks_mirror3:9312' as nodes, '/var/data/click_query/' as path
Note that when this syntax is used,
cluster_post_nodes_set list is not updated automatically. Use ALTER CLUSTER … UPDATE nodes
to update it.
JOIN CLUSTER statement completes when a node receives all the neccesary data to be in sync with all the other nodes in the cluster.
Deleting a cluster¶
Delete statement removes a cluster specified with name. The cluster gets removed from all the nodes, but its indexes are left intact and become active local non-replicated indexes.
DELETE CLUSTER click_query
ALTER CLUSTER <cluster_name> ADD <index_name> adds an existing local PQ index to the cluster.
The node which receives the ALTER query sends the index to the other nodes in the cluster. All the local
indexes with the same name on the other nodes of the cluster get replaced with the new index.
ALTER CLUSTER <cluster_name> DROP <index_name> forgets about a local PQ index, i.e., it doesn’t remove
the index files on the nodes but just makes it an active non-replicated index.
ALTER CLUSTER click_query ADD clicks_daily_index ALTER CLUSTER posts DROP weekly_index
ALTER CLUSTER <cluster_name> UPDATE nodes statement updates node lists on each node of the cluster to include
every active node in the cluster. See Joining a cluster for more info on node lists.
ALTER CLUSTER posts UPDATE nodes
For example, when the cluster was initially created, the list of nodes used for rejoining the cluster was
Since then other nodes joined the cluster and now we have the following active nodes:
But the list of nodes used for rejoining the cluster is still the same. Running the
ALTER CLUSTER ... UPDATE nodes statement
copies the list of active nodes to the list of nodes used to rejoin on restart. After this, the list of nodes used on restart includes all
the active nodes in the cluster.
Both lists of nodes can be viewed using SHOW STATUS statement (
All write statements such as
change the content of a cluster’s index should use
cluster_name:index_name expression in place of an index name to make
sure the change is propagated to all replicas in the cluster. An error will be triggered otherwise.
INSERT INTO posts:weekly_index VALUES ( 'iphone case' ) TRUNCATE RTINDEX click_query:weekly_index
Read statements such as
DESCRIBE can use either regular index names not prepended with
a cluster name or
cluster_name:index_name syntax ignores the cluster name and may be used
on an index that doesn’t belong to the cluster.
SELECT * FROM weekly_index CALL PQ('posts:weekly_index', 'document is here')
Insertion of a percolate query performed at multiple nodes of the same cluster at the same time with auto generated document id may trigger an error as, for now, id auto generation takes into account only the local index. This may generate a duplicate id and replication requires no id conflicts. If an insert fails, retry should work well in most cases, but it depends on the insertion rate.
However, replacement of percolate queries at multiple nodes at the same time with auto generated document
id may cause only the query from the last finished request to be replaced.
In the future, this behavior will be improved by switching to UUIDs.
SHOW STATUS outputs, among other information, cluster status variables. The output format is
variable_value. Most of them are described in
Galera Documentation Status.
Additionally we display:
- cluster_name - name of the cluster
- node_state - current state of the node:
- indexes_count - number of indexes managed by the cluster
- indexes - list of index names managed by the cluster
- nodes_set - list of nodes in the cluster defined with cluster CREATE, JOIN or ALTER UPDATE commands
- nodes_view - actual list of nodes in cluster which this node sees
mysql> SHOW STATUS; +----------------------------+-------------------------------------------------------------------------------------+ | Counter | Value | +----------------------------+-------------------------------------------------------------------------------------+ | cluster_name | post | | cluster_post_state_uuid | fba97c45-36df-11e9-a84e-eb09d14b8ea7 | | cluster_post_conf_id | 1 | | cluster_post_status | primary | | cluster_post_size | 5 | | cluster_post_local_index | 0 | | cluster_post_node_state | synced | | cluster_post_indexes_count | 2 | | cluster_post_indexes | pq1,pq_posts | | cluster_post_nodes_set | 10.10.0.1:9312 | | cluster_post_nodes_view | 10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication |
Replication plugin options can be changed using SET statement:
SET CLUSTER click_query GLOBAL 'pc.bootstrap' = 1
See Galera Documentation Parameters for a list of available options.
Restarting a cluster¶
A replication cluster requires its single node to be started as a
reference point before all the other nodes join it and form a cluster. This is
called cluster bootstrapping which introduces a
primary component before others
see that as a reference point to sync up the data from. The restart of a single node
or reconnecting from a node after a shutdown can be done as usual.
After the whole cluster shutdown the daemon that was stopped last should be started first
--new-cluster command line option. To make sure that the daemon is able to
start as a reference point the
grastate.dat file located at the cluster path
should be updated with the value of
safe_to_bootstrap option. I.e.,
safe_to_bootstrap=1, must be satisfied.
An attempt to start any other node without these options set will trigger an error.
To override this protection and start cluster from another daemon forcibly,
command line option may be used.
In case of a hard crash or an unclean shutdown of all the daemons in the cluster you need to
identify the most advanced node that has the largest
seqno in the
located at the cluster path and start that daemon with the command line
Cluster with diverged nodes¶
Sometimes replicated nodes can diverge from each other. The state of all the nodes
might turn into
non-primary due to a network split between nodes, a cluster
crash, or if the replication plugin hits an exception when determining the
Then it’s necessary to select a node and promote it to the
To determine which node needs to be a reference, compare the
cluster status variable value on all nodes. If all the daemons are already
running there’s no need to start the cluster again. You just need to promote the
most advanced node to the
primary component with SET statement:
SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
All other nodes will reconnect to the node and resync their data based on this node.