Re: + fs-break-generic_file_buffered_read-up-into-multiple-functions.patch added to -mm tree

Jens Axboe <axboe@xxxxxxxxx> · Thu, 29 Oct 2020 07:57:34 -0600

On 10/28/20 4:26 PM, Jens Axboe wrote:
> On 10/28/20 4:22 PM, Andrew Morton wrote:
>> On Tue, 27 Oct 2020 13:35:51 +0000 Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>>
>>> On Sun, Oct 25, 2020 at 03:08:17PM -0700, akpm@xxxxxxxxxxxxxxxxxxxx wrote:
>>>> The patch titled
>>>>      Subject: mm/filemap/c: freak generic_file_buffered_read up into multiple functions
>>>> has been added to the -mm tree.  Its filename is
>>>>      fs-break-generic_file_buffered_read-up-into-multiple-functions.patch
>>>
>>> Can we back this out?  It really makes the THP patchset unhappy.  I think
>>> we can do something like this afterwards, but doing it this way round is
>>> ridiculously hard.
>>
>> Two concerns:
>>
>> : On my test box, 4k buffered random reads go from ~150k to ~250k iops,
>> : and the improvements to big sequential reads are even bigger.
>>
>> That's a big improvement!  We want that improvement.  Throwing it away
>> on behalf of an as-yet-unmerged feature patchset hurts.  Can we expect that
>> this improvement will be available post-that-patchset?  And when?
>>
>> (This improvment is rather hard to believe, really - more details on the
>> test environment would be useful.  Can we expect that people will in
>> general see similar benefits or was there something special about the
>> testing?)
> 
> I did see some wins when I tested this. I'll try and run some testing
> tomorrow and report back. If there's something specifically you want to
> see tested, let me know.

I did some testing, unfortunately it's _very_ hard to produce somewhat
consistent and good numbers as it quickly becomes a game of kswapd.
Here's a basic case of 4 threads doing 32k random reads:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    462 root      20   0       0      0      0 R  65.5   0.0   0:08.02 kswapd0
   2287 axboe     20   0 1303448   2176   1072 R  46.6   0.0   0:05.35 fio
   2289 axboe     20   0 1303456   2196   1092 D  46.6   0.0   0:05.34 fio
   2290 axboe     20   0 1303460   2216   1112 D  46.6   0.0   0:05.37 fio
   2288 axboe     20   0 1303452   2224   1120 R  45.9   0.0   0:05.33 fio

Sad face... Unfortunately once kswapd kicks in, performance also
plummets. This box only has 32G of ram, and you can fill that in less
than 10 seconds doing buffered reads like that.

I ran 4k and 32k testing, and using 1 and 4 threads. But given the above
sadness, it quickly ends up looking the same for me.

What I noticed in my initial testing on Kent's patches (which was
focused on correctness) was that a read+write verify workload had
consistently better read throughput.

-- 
Jens Axboe