Slow performance of collate "en_US.utf8"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone!

I see poor performance of text sorting of collate "en_US.utf8" in PG 17.4.

Test query:

explain (analyze, costs, buffers, verbose)

SELECT

('БвЁжЫйяЙSёYz&$' || gen.id) collate "en_US.utf8"

FROM generate_series(1, 10000) AS gen(id)

order by 1 desc;

I've got execution time like:

Execution Time: 73.068 ms

Same poor result with ru_RU.UTF8.

With other collations time is much better:

explain (analyze, costs, buffers, verbose)

select

('БвЁжЫйяЙSёYz&$' || gen.id) collate "C"

from generate_series(1, 10000) AS gen(id)

order by 1 desc;

Execution Time: 4.792 ms

explain (analyze, costs, buffers, verbose)

SELECT

('БвЁжЫйяЙSёYz&$' || gen.id) collate "C.utf8"

FROM generate_series(1, 10000) AS gen(id)

order by 1 desc;

Execution Time: 7.473 ms

explain (analyze, costs, buffers, verbose)

select

('БвЁжЫйяЙSёYz&$' || gen.id) collate "und-x-icu"

from generate_series(1, 10000) AS gen(id)

order by 1 desc;

Execution Time: 13.282 ms

Yes, collate C is fastest, ICU collations is ~ 2 slower then C, but "en_US.utf8" is ~ 10x slower!

I suspect it is some performance issue over there.

Can someone of PG hackers reproduce this please?

1) This PG17 instance was installed with default options, and initdb got en_US.utf8 as system default collation and created PG cluster with it.

It seems like most PG databases are created this way with en_US.utf8 by default.

2) Typical text\varchar columns are created with DB default en_US.utf8 and performs poor.

explain (analyze, costs, buffers, verbose)

select

('БвЁжЫйяЙSёYz&$' || gen.id)

from generate_series(1, 10000) AS gen(id)

order by 1 desc;

Execution Time: 73.600 ms

3) The index search operations are also slower with en_US.utf8, but the difference is not as high.

Please see attached file with test table and indexes, with tests on index performance.

System Details:

postgres@borschev-pg-copydb1:~$ uname -a Linux borschev-pg-copydb1 6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19) x86_64 GNU/Linux

postgres@borschev-pg-copydb1:~$ cat /etc/issue Debian GNU/Linux trixie/sid \n \l

select version();

PostgreSQL 17.4 (Debian 17.4-1.pgdg110+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

SELECT * FROM pg_config();

|name |setting |

|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

|BINDIR |/usr/lib/postgresql/17/bin |

|DOCDIR |/usr/share/doc/postgresql-doc-17 |

|HTMLDIR |/usr/share/doc/postgresql-doc-17 |

|INCLUDEDIR |/usr/include/postgresql |

|PKGINCLUDEDIR |/usr/include/postgresql |

|INCLUDEDIR-SERVER|/usr/include/postgresql/17/server |

|LIBDIR |/usr/lib/x86_64-linux-gnu |

|PKGLIBDIR |/usr/lib/postgresql/17/lib |

|LOCALEDIR |/usr/share/locale |

|MANDIR |/usr/share/postgresql/17/man |

|SHAREDIR |/usr/share/postgresql/17 |

|SYSCONFDIR |/etc/postgresql-common |

|PGXS |/usr/lib/postgresql/17/lib/pgxs/src/makefiles/pgxs.mk |

|CONFIGURE | '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--runstatedir=/run' '--disable-maintainer-mode' '--disable-dependency-tracking' '--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-openssl' '--with-libxml' '--with-libxslt' '--mandir=/usr/share/postgresql/17/man' '--docdir=/usr/share/doc/postgresql-doc-17' '--sysconfdir=/etc/postgresql-common' '--datarootdir=/usr/share/' '--datadir=/usr/share/postgresql/17' '--bindir=/usr/lib/postgresql/17/bin' '--libdir=/usr/lib/x86_64-linux-gnu/' '--libexecdir=/usr/lib/postgresql/' '--includedir=/usr/include/postgresql/' '--with-extra-version= (Debian 17.4-1.pgdg110+2)' '--enable-nls' '--enable-thread-safety' '--enable-debug' '--disable-rpath' '--with-uuid=e2fs' '--with-gnu-ld' '--with-gssapi' '--with-ldap' '--with-pgport=5432' '--with-system-tzdata=/usr/share/zoneinfo' 'AWK=mawk' 'MKDIR_P=/bin/mkdir -p' 'PROVE=/usr/bin/prove' 'PYTHON=/usr/bin/python3' 'TAR=/bin/tar' 'XSLTPROC=xsltproc --nonet' 'CFLAGS=-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now' '--enable-tap-tests' '--with-icu' '--with-llvm' 'LLVM_CONFIG=/usr/bin/llvm-config-16' 'CLANG=/usr/bin/clang-16' '--with-lz4' '--with-zstd' '--with-systemd' '--with-selinux' '--enable-dtrace' 'build_alias=x86_64-linux-gnu' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fstack-protector-strong -Wformat -Werror=format-security'|

|CC |gcc |

|CPPFLAGS |-Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 |

|CFLAGS |-Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fno-omit-frame-pointer |

|CFLAGS_SL |-fPIC |

|LDFLAGS |-Wl,-z,relro -Wl,-z,now -L/usr/lib/llvm-16/lib -Wl,--as-needed |

|LDFLAGS_EX | |

|LDFLAGS_SL | |

|LIBS |-lpgcommon -lpgport -lselinux -lzstd -llz4 -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lpthread -lrt -ldl -lm |

|VERSION |PostgreSQL 17.4 (Debian 17.4-1.pgdg110+2)

select * from pg_database;

|oid |datname |datdba|encoding|datlocprovider|datistemplate|datallowconn|dathasloginevt|datconnlimit|datfrozenxid|datminmxid|dattablespace|datcollate |datctype |datlocale|daticurules|datcollversion|datacl |

|------|---------|------|--------|--------------|-------------|------------|--------------|------------|------------|----------|-------------|-----------|-----------|---------|-----------|--------------|-----------------------------------|

|5 |postgres |10 |6 |c |false |true |false |-1 |730 |1 |1,663 |en_US.UTF-8|en_US.UTF-8| | |2.40 | |

|16,388|demo |10 |6 |c |false |true |false |-1 |730 |1 |1,663 |en_US.UTF-8|en_US.UTF-8| | |2.40 | |

|1 |template1|10 |6 |c |true |true |false |-1 |730 |1 |1,663 |en_US.UTF-8|en_US.UTF-8| | |2.40 |{=c/postgres,postgres=CTc/postgres}|

|4 |template0|10 |6 |c |true |false |false |-1 |730 |1 |1,663 |en_US.UTF-8|en_US.UTF-8| | | |{=c/postgres,postgres=CTc/postgres}|

Collations are standard, out-of-the-box:

select collname, collprovider, collencoding, collcollate, collctype, colllocale,

collversion, collisdeterministic

FROM pg_collation cll

--where cll.collname in ('C.utf8', 'en_US.utf8', 'ru-RU-x-icu')

|collname |collprovider|collencoding|collcollate|collctype |colllocale|collversion|collisdeterministic|

|-----------|------------|------------|-----------|----------|----------|-----------|-------------------|

|default |d |-1 | | | | |true |

|C |c |-1 |C |C | | |true |

|POSIX |c |-1 |POSIX |POSIX | | |true |

|ucs_basic |b |6 | | |C |1 |true |

|pg_c_utf8 |b |6 | | |C.UTF-8 |1 |true |

|unicode |i |-1 | | |und |153.14 |true |

|C.utf8 |c |6 |C.utf8 |C.utf8 | | |true |

|en_US.utf8 |c |6 |en_US.utf8 |en_US.utf8| |2.40 |true |

|ru_RU.utf8 |c |6 |ru_RU.utf8 |ru_RU.utf8| |2.40 |true |

|en_US |c |6 |en_US.utf8 |en_US.utf8| |2.40 |true |

|ru_RU |c |6 |ru_RU.utf8 |ru_RU.utf8| |2.40 |true |

Attachment: test_coll_small_2_PG17.sql
Description: application/sql


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux