Hi everyone!
I see poor performance of text sorting of collate "en_US.utf8" in PG 17.4.
Test query:
explain (analyze, costs, buffers, verbose)
SELECT
('БвЁжЫйяЙSёYz&$' || gen.id) collate "en_US.utf8"
FROM generate_series(1, 10000) AS gen(id)
order by 1 desc;
I've got execution time like:
Execution Time: 73.068 ms
Same poor result with ru_RU.UTF8.
With other collations time is much better:
explain (analyze, costs, buffers, verbose)
select
('БвЁжЫйяЙSёYz&$' || gen.id) collate "C"
from generate_series(1, 10000) AS gen(id)
order by 1 desc;
Execution Time: 4.792 ms
explain (analyze, costs, buffers, verbose)
SELECT
('БвЁжЫйяЙSёYz&$' || gen.id) collate "C.utf8"
FROM generate_series(1, 10000) AS gen(id)
order by 1 desc;
Execution Time: 7.473 ms
explain (analyze, costs, buffers, verbose)
select
('БвЁжЫйяЙSёYz&$' || gen.id) collate "und-x-icu"
from generate_series(1, 10000) AS gen(id)
order by 1 desc;
Execution Time: 13.282 ms
Yes, collate C is fastest, ICU collations is ~ 2 slower then C, but "en_US.utf8" is ~ 10x slower!
I suspect it is some performance issue over there.
Can someone of PG hackers reproduce this please?
1) This PG17 instance was installed with default options, and initdb got en_US.utf8 as system default collation and created PG cluster with it.
It seems like most PG databases are created this way with en_US.utf8 by default.
2) Typical text\varchar columns are created with DB default en_US.utf8 and performs poor.
explain (analyze, costs, buffers, verbose)
select
('БвЁжЫйяЙSёYz&$' || gen.id)
from generate_series(1, 10000) AS gen(id)
order by 1 desc;
Execution Time: 73.600 ms
3) The index search operations are also slower with en_US.utf8, but the difference is not as high.
Please see attached file with test table and indexes, with tests on index performance.
System Details:
postgres@borschev-pg-copydb1:~$ uname -a Linux borschev-pg-copydb1 6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19) x86_64 GNU/Linux
postgres@borschev-pg-copydb1:~$ cat /etc/issue Debian GNU/Linux trixie/sid \n \l
select version();
PostgreSQL 17.4 (Debian 17.4-1.pgdg110+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
SELECT * FROM pg_config();
|name |setting |
|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|BINDIR |/usr/lib/postgresql/17/bin |
|DOCDIR |/usr/share/doc/postgresql-doc-17 |
|HTMLDIR |/usr/share/doc/postgresql-doc-17 |
|INCLUDEDIR |/usr/include/postgresql |
|PKGINCLUDEDIR |/usr/include/postgresql |
|INCLUDEDIR-SERVER|/usr/include/postgresql/17/server |
|LIBDIR |/usr/lib/x86_64-linux-gnu |
|PKGLIBDIR |/usr/lib/postgresql/17/lib |
|LOCALEDIR |/usr/share/locale |
|MANDIR |/usr/share/postgresql/17/man |
|SHAREDIR |/usr/share/postgresql/17 |
|SYSCONFDIR |/etc/postgresql-common |
|PGXS |/usr/lib/postgresql/17/lib/pgxs/src/makefiles/pgxs.mk |
|CONFIGURE | '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-openssl' '--with-libxml' '--with-libxslt' '--mandir=/usr/share/postgresql/17/man' '--docdir=/usr/share/doc/postgresql-doc-17' '--sysconfdir=/etc/postgresql-common' '--datarootdir=/usr/share/' '--datadir=/usr/share/postgresql/17' '--bindir=/usr/lib/postgresql/17/bin' '--libdir=/usr/lib/x86_64-linux-gnu/' '--libexecdir=/usr/lib/postgresql/' '--includedir=/usr/include/postgresql/' '--with-extra-version= (Debian 17.4-1.pgdg110+2)' '--enable-nls' '--enable-thread-safety' '--enable-debug' '--disable-rpath' '--with-uuid=e2fs' '--with-gnu-ld' '--with-gssapi' '--with-ldap' '--with-pgport=5432' '--with-system-tzdata=/usr/share/zoneinfo' 'AWK=mawk' 'MKDIR_P=/bin/mkdir -p' 'PROVE=/usr/bin/prove' 'PYTHON=/usr/bin/python3' 'TAR=/bin/tar' 'XSLTPROC=xsltproc --nonet' 'CFLAGS=-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' '--enable-tap-tests' '--with-icu' '--with-llvm' 'LLVM_CONFIG=/usr/bin/llvm-config-16' 'CLANG=/usr/bin/clang-16' '--with-lz4' '--with-zstd' '--with-systemd' '--with-selinux' '--enable-dtrace' 'build_alias=x86_64-linux-gnu' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fstack-protector-strong -Wformat -Werror=format-security'|
|CC |gcc |
|CPPFLAGS |-Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 |
|CFLAGS |-Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer |
|CFLAGS_SL |-fPIC |
|LDFLAGS |-Wl,-z,relro -Wl,-z,now -L/usr/lib/llvm-16/lib -Wl,--as-needed |
|LDFLAGS_EX | |
|LDFLAGS_SL | |
|LIBS |-lpgcommon -lpgport -lselinux -lzstd -llz4 -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lpthread -lrt -ldl -lm |
|VERSION |PostgreSQL 17.4 (Debian 17.4-1.pgdg110+2)
select * from pg_database;
|oid |datname |datdba|encoding|datlocprovider|datistemplate|datallowconn|dathasloginevt|datconnlimit|datfrozenxid|datminmxid|dattablespace|datcollate |datctype |datlocale|daticurules|datcollversion|datacl |
|------|---------|------|--------|--------------|-------------|------------|--------------|------------|------------|----------|-------------|-----------|-----------|---------|-----------|--------------|-----------------------------------|
|5 |postgres |10 |6 |c |false |true |false |-1 |730 |1 |1,663 |en_US.UTF-8|en_US.UTF-8| | |2.40 | |
|16,388|demo |10 |6 |c |false |true |false |-1 |730 |1 |1,663 |en_US.UTF-8|en_US.UTF-8| | |2.40 | |
|1 |template1|10 |6 |c |true |true |false |-1 |730 |1 |1,663 |en_US.UTF-8|en_US.UTF-8| | |2.40 |{=c/postgres,postgres=CTc/postgres}|
|4 |template0|10 |6 |c |true |false |false |-1 |730 |1 |1,663 |en_US.UTF-8|en_US.UTF-8| | | |{=c/postgres,postgres=CTc/postgres}|
Collations are standard, out-of-the-box:
select collname, collprovider, collencoding, collcollate, collctype, colllocale,
collversion, collisdeterministic
FROM pg_collation cll
--where cll.collname in ('C.utf8', 'en_US.utf8', 'ru-RU-x-icu')
|collname |collprovider|collencoding|collcollate|collctype |colllocale|collversion|collisdeterministic|
|-----------|------------|------------|-----------|----------|----------|-----------|-------------------|
|default |d |-1 | | | | |true |
|C |c |-1 |C |C | | |true |
|POSIX |c |-1 |POSIX |POSIX | | |true |
|ucs_basic |b |6 | | |C |1 |true |
|pg_c_utf8 |b |6 | | |C.UTF-8 |1 |true |
|unicode |i |-1 | | |und |153.14 |true |
|C.utf8 |c |6 |C.utf8 |C.utf8 | | |true |
|en_US.utf8 |c |6 |en_US.utf8 |en_US.utf8| |2.40 |true |
|ru_RU.utf8 |c |6 |ru_RU.utf8 |ru_RU.utf8| |2.40 |true |
|en_US |c |6 |en_US.utf8 |en_US.utf8| |2.40 |true |
|ru_RU |c |6 |ru_RU.utf8 |ru_RU.utf8| |2.40 |true |
Attachment:
test_coll_small_2_PG17.sql
Description: application/sql