> On 11/03/2023 19:23 CET Alexey Murz Korepov <murznn@xxxxxxxxx> wrote: > > Could anyone suggest to me the ways to change a database collation with > removing all the duplicates, caused by this change? Collations can only affect uniqueness if they are nondeterministic or if you have functional indexes, e.g. using lower(text) for a case-insensitive unique index. Otherwise the collations only affect text ordering. You need to find all duplicates in the original database (or a copy) by using the target collation for comparisons[0]. Delete all duplicates, pg_dump that database and restore in database with new collation. > I have a pretty large database (around 500 Gb) that was created with > `en_US.UTF-8` collation, but the new version of the application requires > that the collation should be strictly `C`. Both collations should be deterministic[1] which means they consider byte sequences when comparing strings for equality. What does pg_collation say? select * from pg_collation where collname in ('C', 'en_US.utf8') > I can successfully create a dump of the old database using `pgdump`. > > But when I'm importing the dump to the new database with `COLLATE=C`, I see > a lot of errors on ALTER TABLE when creating primary keys, and the same - > for `CREATE INDEX` commands: Restore the dump in a new database with same collation en_US.UTF-8 to rule out that the duplicates come from collation C and do not already exist in the original database. > ALTER TABLE > ERROR: could not create unique index "access_tokens_pkey" > DETAIL: Key (id)=(16734) is duplicated. Is column "id" an integer? If so then this looks like a corrupt index in the original database because collations should not affect indexes on non-text columns. > ERROR: could not create unique index "access_tokens_token_key" > DETAIL: Key (token)=(XXX) is duplicated. > CONTEXT: parallel worker > ERROR: could not create unique index "account_data_uniqueness" > DETAIL: Key (user_id, account_data_type)=(@username:XXX, im.vector.setting.breadcrumbs) is duplicated. > CREATE INDEX > ERROR: could not create unique index "e2e_cross_signing_keys_idx" > DETAIL: Key (user_id, keytype, stream_id)=(@-----------------:matrix.org (http://matrix.org), master, 1606172) is duplicated. > ERROR: could not create unique index "e2e_cross_signing_keys_stream_idx" > DETAIL: Key (stream_id)=(1779009) is duplicated. > > So, could anyone give some advice on how to perform the collation change > with cleaning out all the duplicates? Thanks! [0] https://www.postgresql.org/docs/current/indexes-collations.html [1] https://www.postgresql.org/docs/current/collation.html#COLLATION-NONDETERMINISTIC -- Erik