On Fri, 2023-02-10 at 03:20 +0100, Chris wrote: > I'm pondering migrating an FTS application from Solr to Postgres, just > because we use Postgres for everything else. > > The application is basically fgrep with a web frontend. However the > indexed documents are very computer network specific and contain a lot > of hyphenated hostnames with dot-separated domains, as well as IPv4 and > IPv6 addresses. In Solr I was using ngrams and customized the > TokenizerFactories until more or less only whitespace was as separator, > while [.:-_\d] remains part of the ngrams. This allows to search for > ".12.255/32" or "xzy-eth5.example.org" without any false positives. > > It looks like a straight conversion of this method is not possible since > the tokenization in pg_trgm is not configurable afaict. Is there some > other good method to search for a random substring including all the > punctuation using an index? Or a pg_trgm-style module that is more > flexible like the Solr/Lucene variant?'127.0.0.1/32' > > Or maybe hacking my own pg_trgm wouldn't be so hard and could be fun, do > I pretty much just need to change the emitted tokens or will this lead > to significant complications in the operators, indexes etc.? Here is a hack that you can try: pre-process your strings and replace symbols with rare characters: SELECT show_trgm(translate('127.0.0.1/32', './', 'qx')); show_trgm ═════════════════════════════════════════════════════════ {" 1"," 12",0q0,0q1,127,1x3,27q,"32 ",7q0,q0q,q1x,x32} (1 row) Then you could search like WHERE translate(search_string, './', 'qx') LIKE translate('%127.0.0.1/32%', './', 'qx') AND search_string LIKE '%127.0.0.1/32%' The first condition can use a trigram index, and the second filters out false positives. Yours, Laurenz Albe