Search Postgresql Archives

Re: pg_trgm vs. Solr ngram

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2023-02-10 at 03:20 +0100, Chris wrote:
> I'm pondering migrating an FTS application from Solr to Postgres, just 
> because we use Postgres for everything else.
> 
> The application is basically fgrep with a web frontend. However the 
> indexed documents are very computer network specific and contain a lot 
> of hyphenated hostnames with dot-separated domains, as well as IPv4 and 
> IPv6 addresses. In Solr I was using ngrams and customized the 
> TokenizerFactories until more or less only whitespace was as separator, 
> while [.:-_\d] remains part of the ngrams. This allows to search for 
> ".12.255/32" or "xzy-eth5.example.org" without any false positives.
> 
> It looks like a straight conversion of this method is not possible since 
> the tokenization in pg_trgm is not configurable afaict. Is there some 
> other good method to search for a random substring including all the 
> punctuation using an index? Or a pg_trgm-style module that is more 
> flexible like the Solr/Lucene variant?'127.0.0.1/32'
> 
> Or maybe hacking my own pg_trgm wouldn't be so hard and could be fun, do 
> I pretty much just need to change the emitted tokens or will this lead 
> to significant complications in the operators, indexes etc.?

Here is a hack that you can try: pre-process your strings and replace
symbols with rare characters:

  SELECT show_trgm(translate('127.0.0.1/32', './', 'qx'));

                          show_trgm                        
  ═════════════════════════════════════════════════════════
   {"  1"," 12",0q0,0q1,127,1x3,27q,"32 ",7q0,q0q,q1x,x32}
  (1 row)

Then you could search like

  WHERE translate(search_string, './', 'qx') LIKE translate('%127.0.0.1/32%', './', 'qx')
    AND search_string LIKE '%127.0.0.1/32%'

The first condition can use a trigram index, and the second filters out
false positives.

Yours,
Laurenz Albe






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux