Re: The dangers of streaming across versions of glibc: A cautionary tale

Matthew Kelly <mkelly@xxxxxxxxxxxxxxx> · Thu, 7 Aug 2014 15:07:04 +0000

We are currently running with the en_US.UTF-8 collation.  It was a decision made long ago, and seeing as we never actually rely on the sort order of internationalized strings (other than for stability, apparently), we have never had any motivation to change this practice.

Some way of versioning collations, which is not tied to glibc seems immensely appealing.  Without a good way of testing the differences between glibc sort versions, it seems the only safe thing to do at the moment is to guarantee all streaming replica's run from the exact same OS image.  Which is fine until you want to upgrade your OS, and need to do a dump-restore instead of being able to do that in a rolling fashion.

To Bruce's point the way I was able to test for this issue in a particular index was (approximately):
--Assuming textfield is what the index is on, this causes the query planner to scan the index and give the position in the index.
CREATE TABLE index_order (SELECT textfield, dense_rank() OVER (ORDER BY textfield) as i_order FROM table);
--No index here, postgres must sort
CREATE TABLE both_order as (SELECT textfield, i_order, dense_rank() OVER (ORDER BY textfield) as sort_order FROM index_order);
-- If this doesn't return zero, you have a problem
SELECT count(*) FROM both_orders WHERE i_order <> sort_order;

This method is really slow on a big table, and I'm not going to promise it always works, but that is how we found the root cause.

- Matt K

On Aug 6, 2014, at 9:46 PM, Peter Geoghegan <peter.geoghegan86@xxxxxxxxx>
 wrote:

> On Wed, Aug 6, 2014 at 6:30 PM, Tatsuo Ishii <ishii@xxxxxxxxxxxxxx> wrote:
>> Another idea could be having our own collation data to isolate any
>> changes from outside world. I vaguley recall this had been discussed
>> before.
> 
> That's probably the best solution. It would not be the first time that
> we decided to stop relying on the operating system's facilities due to
> various problems (e.g. we used to use the C standard library qsort()
> until about 2006). The only problem is that it's a lot of work. One
> possible solution that has been proposed is to adopt ICU [1]. That
> might allow us to say "this is the official way that PostgreSQL 9.6
> sorts Japanese; you may use the old way if you want, but it's
> incompatible with the new way". ICU would give us a standard
> versioning interface [2]. They seem to take this seriously, and are
> aware of our considerations around B-Tree indexes on text.
> 
> [1] https://wiki.postgresql.org/wiki/Todo:ICU
> [2] http://userguide.icu-project.org/collation/architecture#TOC-Versioning
> -- 
> Regards,
> Peter Geoghegan

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general