Re: Performance degradation in Index searches with special characters

Thomas Munro <thomas.munro@xxxxxxxxx> · Mon, 7 Oct 2024 09:24:15 +1300

On Mon, Oct 7, 2024 at 9:02 AM Shiv Iyer <shiv@xxxxxxxxxxxxx> wrote:
>    - As the string length increases, the performance degrades exponentially when using special characters. This is due to the collation’s computational complexity for each additional character comparison.

That's a pretty interesting observation, worthy of a bug report.  I
don't know the details offhand but the algorithm used should be
basically the same, or more likely a simpler subset, of what other
collation implementations are using IIRC (don't quote me but I think
it's supposed to be ISO 14651 which is essentially a subset of the UCA
stuff that ICU is using (it is "aligned with" UCA DUCET), without CLDR
customisations and perhaps various other complications), so it doesn't
sound like it should be fundamentally required to be *more* expensive
than ICU...