TSV/CSV data source

This is the simplest way to pass data to the indexer. It was created due to xmlpipe2 limitations. Namely, indexer must map each attribute and field tag in XML file to corresponding schema element. This mapping requires some time. And time increases with increasing the number of fields and attributes in schema. There is no such issue in tsvpipe because each field and attribute is a particular column in TSV file. So, in some cases tsvpipe could work slightly faster than xmlpipe2.

The first column in TSVCSV file must be a document ID. The rest ones must mirror the declaration of fields and attributes in schema definition.

The difference between tsvpipe and csvpipe is delimiter and quoting rules. tsvpipe has tab character as hardcoded delimiter and has no quoting rules. csvpipe has option csvpipe_delimiter for delimiter with default value ‘,’ and also has quoting rules, such as:

  • any field may be quoted
  • fields containing a line-break, double-quote or commas should be quoted
  • a double quote character in a field must be represented by two double quote characters

tsvpipe and csvpipe have same field and attrribute declaration derectives as xmlpipe.

tsvpipe declarations:

tsvpipe_command, tsvpipe_field, tsvpipe_field_string, tsvpipe_attr_uint, tsvpipe_attr_timestamp, tsvpipe_attr_bool, tsvpipe_attr_float, tsvpipe_attr_bigint, tsvpipe_attr_multi, tsvpipe_attr_multi_64, tsvpipe_attr_string, tsvpipe_attr_json

csvpipe declarations:

csvpipe_command, csvpipe_field, csvpipe_field_string, csvpipe_attr_uint, csvpipe_attr_timestamp, csvpipe_attr_bool, csvpipe_attr_float, csvpipe_attr_bigint, csvpipe_attr_multi, csvpipe_attr_multi_64, csvpipe_attr_string, csvpipe_attr_json

source tsv_test
{
    type = tsvpipe
    tsvpipe_command = cat /tmp/rock_bands.tsv
    tsvpipe_field = name
    tsvpipe_attr_multi = genre_tags
}
1   Led Zeppelin    35,23,16
2   Deep Purple 35,92
3   Frank Zappa 35,23,16,92,33,24