I am running some sorting sql on my machine,test data is tpch100g, and sql is:explain analyze verbose select l_shipdate,l_orderkey from lineitem_0 order by l_shipdate,l_orderkey desc .
I found that when I set work_mem to 65MB,sort method is external merge with disk,which cost 50s in my server.
and when I set work_mem to 6GB,sort method is quicksort in memory, which cost 78s in same server.
It is strange that more memory bring worse performance.I used perf and find that when work_mem is 6GB,L1-dcache-load-misses is much more than 64MB when qsort and tuplesort_gettuple_common.
So,can we try to split memory to pieces and qsort every one,and merge than all in memory,I have tried this in my local code, and got about 12% improvement when memory is enough.