Reece Hart <reece@xxxxxxxxx> writes: > For the purposes of indexing these names, I suspect I'd get the majority > of cases by removing a hyphen when it's followed by 1 or 2 chars from > [a-zA-Z0-9]. Does that require a custom parser? Yeah, looks like it: regression=# select * from ts_debug('MCL1 MCL-1'); alias | description | token | dictionaries | dictionary | lexemes -----------+--------------------------+-------+----------------+--------------+--------- numword | Word, letters and digits | MCL1 | {simple} | simple | {mcl1} blank | Space symbols | | {} | | asciiword | Word, all ASCII | MCL | {english_stem} | english_stem | {mcl} int | Signed integer | -1 | {simple} | simple | {-1} (4 rows) I had thought you might get a "numhword" output, but that only seems to happen if there's at least one letter after the dash: regression=# select * from ts_debug('MCL1 MCL-X1'); alias | description | token | dictionaries | dictionary | lexemes -----------------+------------------------------------------+--------+----------------+--------------+---------- numword | Word, letters and digits | MCL1 | {simple} | simple | {mcl1} blank | Space symbols | | {} | | numhword | Hyphenated word, letters and digits | MCL-X1 | {simple} | simple | {mcl-x1} hword_asciipart | Hyphenated word part, all ASCII | MCL | {english_stem} | english_stem | {mcl} blank | Space symbols | - | {} | | hword_numpart | Hyphenated word part, letters and digits | X1 | {simple} | simple | {x1} (6 rows) regards, tom lane