Re: How can sort performance be so different

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Peter

I did check out using ICU and the performance does indeed seem
comparable with C locale:

EXPLAIN ANALYZE select * from chart order by name COLLATE "lo-x-icu";
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1470.65..1504.24 rows=13436 width=1203) (actual
time=82.752..85.723 rows=13436 loops=1)
   Sort Key: name COLLATE "lo-x-icu"
   Sort Method: quicksort  Memory: 6253kB
   ->  Seq Scan on chart  (cost=0.00..549.36 rows=13436 width=1203)
(actual time=0.043..12.634 rows=13436 loops=1)
 Planning time: 1.610 ms
 Execution time: 96.060 ms
(6 rows)

The Laos folk have confirmed that the sort order with C locale was not
correct.  So setting the ICU locale seems to be the way forward.

The problem is that this is a large java application with a great
number of tables and queries.  Also it is used in 60+ countries not
just Laos.  So we cannot simply modify the queries or table creation
scripts directly such as in the manner above.  I was hoping the
solution would just be to set a default locale on the database
(perhaps even und-x-icu) but I see now that this doesn't seem to be
currently possible with postgresql 10 ie. set the locale on database
creation to a *-icu locale.

Is this also a limitation on postgresql 11?  (Upgrading would be possible)

Any other workarounds worth trying?  The magnitude of this issue is
significant - 1000x slower on these basic sorts is crippling the
application, probably also in a number of other queries.

Regards
Bob

On Wed, 30 Jan 2019 at 23:54, Peter Geoghegan <pg@xxxxxxx> wrote:
>
> On Wed, Jan 30, 2019 at 3:57 AM Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
> > (i) whether the sort order makes sense for the Laos names; and
> > (ii) what the locale settings are on the production server where the
> > problem was first reported.
> >
> > There will be some turnaround before I get this information.  I am
> > guessing that the database is using "en_US" rather than anything Laos
> > specific.  In which case "C" would probably be no worse re sort order.
> > But will know better soon.
> >
> > This has been a long but very fruitful investigation.  Thank you all for input.
>
> If you can find a way to use an ICU collation, it may be possible to
> get Laotian sort order with performance that's a lot closer to the
> performance you see with the C locale. The difference that you're
> seeing is obviously explainable in large part by the C locale using
> the abbreviated keys technique. The system glibc's collations cannot
> use this optimization.
>
> I believe that some locales have inherently more expensive
> normalization processes (comparisons) than others, but if you can
> effective amortize the cost per key by building an abbreviated key, it
> may not matter that much. And, ICU may be faster than glibc anyway.
>
> --
> Peter Geoghegan




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux