On Sun, 4 Dec 2022 at 00:14, yang zhao <azurezhao8369@xxxxxxxxx> wrote: > So,can we try to split memory to pieces and qsort every one,and merge than all in memory,I have tried this in my local code, and got about 12% improvement when memory is enough. We're not very good at CPU cache awareness. The hard part here would be to figure out how large to make those pieces. Maybe the value could be a GUC which can be tuned similar to work_mem. One problem with that might be that L3 is shared with other processes and it does not seem unreasonable that smaller-than-L3 sizes will be better when the system is busy. However, it also seems possible that sorting in L3-sized chunks might still be faster even when the system is busy with many other tasks. The best thing to do here is to post your proposal along with a patch and the performance tests you've done to the pgsql-hackers mailing list. This way people can test this for themselves and see if they get similar results. Patches such as this one need to be tested on lots of different hardware. pgsql-hackers is the place for discussions about that. It's full of people that might be willing to patch with your patch and test if the performance is also improved on their hardware. David