On 10/29/20 9:05 AM, Jens Axboe wrote: > On 10/29/20 9:03 AM, Matthew Wilcox wrote: >> On Thu, Oct 29, 2020 at 09:02:31AM -0600, Jens Axboe wrote: >>> On 10/29/20 8:57 AM, Matthew Wilcox wrote: >>>> On Thu, Oct 29, 2020 at 07:57:34AM -0600, Jens Axboe wrote: >>>>> On 10/28/20 4:26 PM, Jens Axboe wrote: >>>>>> I did see some wins when I tested this. I'll try and run some testing >>>>>> tomorrow and report back. If there's something specifically you want to >>>>>> see tested, let me know. >>>>> >>>>> I did some testing, unfortunately it's _very_ hard to produce somewhat >>>>> consistent and good numbers as it quickly becomes a game of kswapd. >>>>> Here's a basic case of 4 threads doing 32k random reads: >>>>> >>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>>> 462 root 20 0 0 0 0 R 65.5 0.0 0:08.02 kswapd0 >>>>> 2287 axboe 20 0 1303448 2176 1072 R 46.6 0.0 0:05.35 fio >>>>> 2289 axboe 20 0 1303456 2196 1092 D 46.6 0.0 0:05.34 fio >>>>> 2290 axboe 20 0 1303460 2216 1112 D 46.6 0.0 0:05.37 fio >>>>> 2288 axboe 20 0 1303452 2224 1120 R 45.9 0.0 0:05.33 fio >>>>> >>>>> Sad face... Unfortunately once kswapd kicks in, performance also >>>>> plummets. This box only has 32G of ram, and you can fill that in less >>>>> than 10 seconds doing buffered reads like that. >>>>> >>>>> I ran 4k and 32k testing, and using 1 and 4 threads. But given the above >>>>> sadness, it quickly ends up looking the same for me. >>>> >>>> What if your workload actually fits in memory? That would seem to be >>>> the situation where Kent's patches would make a difference. >>> >>> That was my point, if I do multi-page reads then memory is filled in >>> seconds, which makes it pretty hard to provide any accurate numbers. I >>> don't have anything slow in this test box, I'll see if I can find >>> something to stick in it. >> >> I meant re-reading files which fit in memory, so you take ten seconds >> to fill the page cache, then read from the same files over and over. > > That I can certainly try. Reading a 16G file randomly for 10 seconds, using 1 or 4 threads and either 4k or 32k reads: test 5.10-rc1 5.10-rc1+kent ------------------------------------------------------------- 1 thread, 4k 976K 1030K (+5.5%) 4 threads, 4k 3462K 3453K (-0.3%) 1 thread, 32k 299K 322K (+7.7%) 4 threads, 32k 769K 785K (+2.0%) -- Jens Axboe