<
in the glibc strcoll_l
function. This was confirmed during my testing on Debian 12 with glibc version 2.36 (this OS and glibc are being used in our office's Docker image: https://hub.docker.com/_/postgres).test_db=# \d+ test
Table "public.test"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
value | character varying(10) | | not null | | extended | | |
Indexes:
"idx_test" btree (value)
Access method: heap
test_db=# \l test_db
List of databases
Name | Owner | Encoding | Locale Provider | Collate | Ctype | Locale | ICU Rules | Access privileges
---------+----------+----------+-----------------+------------+------------+--------+-----------+-------------------
test_db | postgres | UTF8 | libc | en_US.utf8 | en_US.utf8 | | |
(1 row)
root@715b19170a89:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@715b19170a89:~# ldd --version
ldd (Debian GLIBC 2.36-9+deb12u8) 2.36
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
root@715b19170a89:~# cat test.c
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <string.h>
int main() {
char str1[] = "<";
char *str2 = malloc(65536); // 65535 characters + 1 for null terminator
if (!str2) return 1;
memset(str2, '<', 65535);
str2[65535] = '\0';
locale_t locale = newlocale(LC_COLLATE_MASK, "en_US.UTF-8", NULL);
int result = strcoll_l(str1, str2, locale);
printf("Comparison result: %d\n", result);
freelocale(locale);
free(str2);
return 0;
}
root@715b19170a89:~# time ./test
Comparison result: -1
real 0m4.487s
user 0m4.483s
sys 0m0.003s
en_US.UTF-8
) is widely used, and this opens up the potential for a Denial of Service (DoS) attack. For instance, if user input includes long strings of repeated characters like <
, it can severely degrade performance due to the extended processing time for string comparisons, especially in high-traffic scenarios.Andrey Stikheev <andrey.stikheev@xxxxxxxxx> writes:
> - Changing the collation to 'C' in the query significantly improves
> performance.
What collation are you using, pray tell? (And what database encoding?)
> - Is this performance degradation expected due to collation handling of
> certain special characters in PostgreSQL?
It seems like a performance bug indeed, but not ours --- I'm thinking
it must be in strcoll() or ICU.
Trying it here (on a RHEL8 machine) with en_US.utf8 locale, I see
something similar but not quite as bad:
u8=# SELECT 1 FROM test WHERE value = repeat('<', 65536);
?column?
----------
(0 rows)
Time: 1383.033 ms (00:01.383)
Poking into it with gdb says that the time is indeed spent inside
strcoll:
#0 get_next_seq (pass=1, indirect=0x7fabb4988ea8, extra=0x7fabb4984900 "",
table=0x7fabb490e2b0, weights=<optimized out>,
rulesets=0x7fabb490e2a8 "\001\002\001\005\001\001\001\005", nrules=4,
seq=<synthetic pointer>) at strcoll_l.c:111
#1 __GI___strcoll_l (s1=0x1785878 "<",
s2=0x178587a '<' <repeats 200 times>..., l=<optimized out>)
at strcoll_l.c:338
#2 0x00000000009527a6 in strncoll_libc (arg1=<optimized out>, len1=1,
arg2=<optimized out>, len2=65536, locale=<optimized out>,
locale=<optimized out>) at pg_locale.c:1964
#3 0x00000000009ac760 in varstr_cmp (arg1=0x7fabc2dcbfe9 "<", len1=1,
arg2=0x17958cc '<' <repeats 200 times>..., len2=65536,
collid=<optimized out>) at varlena.c:1567
#4 0x00000000009acfe3 in bttextcmp (fcinfo=0x7ffddd3b0590) at varlena.c:1820
#5 0x00000000009d75fa in FunctionCall2Coll (
flinfo=flinfo@entry=0x7ffddd3b10e8, collation=<optimized out>,
arg1=<optimized out>, arg2=<optimized out>) at fmgr.c:1161
#6 0x0000000000594948 in _bt_compare (rel=0x7fabcde7eed0, key=0x7ffddd3b10c0,
page=<optimized out>, offnum=<optimized out>) at nbtsearch.c:762
#7 0x0000000000594e32 in _bt_binsrch (rel=rel@entry=0x7fabcde7eed0,
key=key@entry=0x7ffddd3b10c0, buf=<optimized out>) at nbtsearch.c:394
It's not the fault of the index machinery, because a single comparison
takes the same amount of time:
u8=# select '<' <= repeat('<', 65536);
?column?
----------
t
(1 row)
Time: 1391.550 ms (00:01.392)
I didn't try it with ICU.
regards, tom lane
Andrey Stikheev