On Fri, 22 Jul 2022 at 16:46, Sergey Burladyan <eshkinkot@xxxxxxxxx> wrote: > I thought that the sorting values are stored entirely in work_mem, but in fact it works somehow differently. > Can anyone suggest how this works? In the classic go by chunks way? To sort values you need to compare them, to compare strings you do not need the whole string, i.e. if you have to 1000 byte strings, one is 500A,500B, other is 1000A, to compare them ( using C locale, others can be done in a similar way ) you can read 10 bytes from each and compare, if they are the same, read 10 more, if they are not you are done, if you hit the end of both strings, they are equal, if you hit the end of one ( the shorter ), that one goes first. You can even do it a character at a time. In the example, after looping 50 times on 10A you hit 10B, 10A, second string goes first, you do not even need to look at the rest. A char at a time will end on the 501 char. And probably PG can compare the strings in the shared buffers, so it only needs some housekeeping information in work mem, and rely on its infrastructure to bring the contents into shared buffers. I do not think you are estimating memory usage right. Francisco Olarte. > > For example, I created this 512MB incompressible file and test table: > > tr -dcs '[:print:]' '[:print:]' < /dev/urandom | tr -d '"'\' | dd bs=1K count=512K of=asciidump > > create unlogged table t1 (v text); > insert into t1 select pg_read_file('asciidump') from generate_series(1, 10); > > select pg_column_size(v), octet_length(v) from t1 limit 1; > pg_column_size | octet_length > ----------------+-------------- > 536870912 | 536870912 > > set work_mem to '64MB'; > > Now I think that 64MB is not enough to sort such large values and postgres will use temp files, > but in fact it does not. > > select temp_files, temp_bytes from pg_stat_database where datname = current_catalog; > temp_files | temp_bytes > ------------+------------ > 0 | 0 > > explain (analyze,verbose,buffers) select v from t1 order by v; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------- > Sort (cost=94.38..97.78 rows=1360 width=32) (actual time=6433.138..6433.140 rows=10 loops=1) > Output: v > Sort Key: t1.v > Sort Method: quicksort Memory: 25kB > Buffers: shared hit=543881 read=679794 written=118012 > -> Seq Scan on public.t1 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.007..0.009 rows=10 loops=1) > Output: v > Buffers: shared hit=1 > Planning Time: 0.035 ms > Execution Time: 6433.155 ms > > > Sort Method: quicksort Memory: 25kB > > select temp_files, temp_bytes from pg_stat_database where datname = current_catalog; > temp_files | temp_bytes > ------------+------------ > 0 | 0 > > WOW! How does it work?! :-) > > -- > Sergey Burladyan > >