David Rowley <dgrowleyml@xxxxxxxxx> writes: > On Thu, 9 Feb 2023 at 13:05, Martin L. Buchanan > <martinlbuchanan@xxxxxxxxx> wrote: >> str LIKE '%foo%' >> str ILIKE '%foo%' >> position('foo' in str) > 0 >> Is Boyer-Moore string searching now used by any of these three? > We use a sort of "lossy" Boyer-Moore-Horspool algorithm. See > text_position_setup() and text_position_next() in varlena.c (the lossy > part comes from the skiptables possibly sharing the same entry for > multiple characters as defined by what skiptablemask is set to). Note that that's used only by the functions in that file; thus, position() yes, but (I)LIKE no. > Tom's argument seems to think it's impossible, so if you find that > it's definitely not impossible, then you can assume he's wrong about > that. My point was that it seems like you'd need a separate BMH engine for each %-separated segment of the LIKE pattern. I'm not quite clear on whether BMH can handle '_' (single-char wildcard) conveniently by itself, although my gut feel is that you can probably make that part work. Maybe you can even extend the idea to embedded %, but that seems more difficult. Given that the fixed substrings of LIKE patterns are usually rather short, I'm unconvinced that BMH would buy much compared to its setup costs. But hey, prove me wrong. (One way to amortize the setup costs could be to cache pre-built BMH data structures keyed by the pattern strings, in a similar fashion to what utils/adt/regexp.c does for compiled regular expressions.) > ... Getting > stuff committed that causes performance regressions in some cases and > wins in other cases can be a pretty difficult and frustrating process. > You have to remember, even if you think the slowdown is some corner > case that only applies ~1% of the time, for some users in the real > world, that might be 100% of their queries. Yeah, whatever the shape of the preprocessing might be, it seems likely that it would be a net loss in some use-cases. We do manage to get past that --- the position() code didn't have BMH to start with --- but it definitely requires solid evidence. regards, tom lane