Search Postgresql Archives

Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



David Rowley <dgrowleyml@xxxxxxxxx> writes:
> On Thu, 9 Feb 2023 at 13:05, Martin L. Buchanan
> <martinlbuchanan@xxxxxxxxx> wrote:
>> str LIKE '%foo%'
>> str ILIKE  '%foo%'
>> position('foo' in str) > 0
>> Is Boyer-Moore string searching now used by any of these three?

> We use a sort of "lossy" Boyer-Moore-Horspool algorithm. See
> text_position_setup() and text_position_next() in varlena.c (the lossy
> part comes from the skiptables possibly sharing the same entry for
> multiple characters as defined by what skiptablemask is set to).

Note that that's used only by the functions in that file; thus,
position() yes, but (I)LIKE no.

> Tom's argument seems to think it's impossible, so if you find that
> it's definitely not impossible, then you can assume he's wrong about
> that.

My point was that it seems like you'd need a separate BMH engine for
each %-separated segment of the LIKE pattern.  I'm not quite clear on
whether BMH can handle '_' (single-char wildcard) conveniently by
itself, although my gut feel is that you can probably make that part
work.  Maybe you can even extend the idea to embedded %, but that
seems more difficult.

Given that the fixed substrings of LIKE patterns are usually rather
short, I'm unconvinced that BMH would buy much compared to its
setup costs.  But hey, prove me wrong.

(One way to amortize the setup costs could be to cache pre-built
BMH data structures keyed by the pattern strings, in a similar fashion
to what utils/adt/regexp.c does for compiled regular expressions.)

> ... Getting
> stuff committed that causes performance regressions in some cases and
> wins in other cases can be a pretty difficult and frustrating process.
> You have to remember, even if you think the slowdown is some corner
> case that only applies ~1% of the time, for some users in the real
> world, that might be 100% of their queries.

Yeah, whatever the shape of the preprocessing might be, it seems
likely that it would be a net loss in some use-cases.  We do manage
to get past that --- the position() code didn't have BMH to start
with --- but it definitely requires solid evidence.

			regards, tom lane





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux