Re: + mm-provide-filemap_range_needs_writeback-helper.patch added to -mm tree

Jens Axboe <axboe@xxxxxxxxx> · Wed, 24 Mar 2021 16:25:52 -0600

On 3/24/21 4:19 PM, Andrew Morton wrote:
> On Wed, 24 Mar 2021 16:13:48 -0600 Jens Axboe <axboe@xxxxxxxxx> wrote:
> 
>> On 3/24/21 4:02 PM, Andrew Morton wrote:
>>> On Wed, 24 Mar 2021 13:48:08 -0600 Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>
>>>> On Tue, Mar 2, 2021 at 4:43 PM <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>
>>>>> The patch titled
>>>>>      Subject: mm: provide filemap_range_needs_writeback() helper
>>>>> has been added to the -mm tree.  Its filename is
>>>>>      mm-provide-filemap_range_needs_writeback-helper.patch
>>>>
>>>> Are you still planning on flushing this out for 5.12??
>>>
>>> Oh.  No, I wasn't planning on that.  I saw nothing which made me think
>>> it was that urgent.
>>
>> Ehm ok, I think that was the plan though when we originally talked about
>> this series. At least that was what was in my original emails on this
>> topic :-)
>>
>>> What is the justification?  How much performance benefit are we talking
>>> here?
>>
>> Well it's pretty huge, since if we return "nope can't do this
>> non-blocking" it'll mean that io_uring has to bump the operation to an
>> async threads. So CPU cost goes way up, and latencies does too.
>>
>> Problem is, we're now at -rc4, and it was my understanding that we'd get
>> this in for 5.12, but obviously way sooner than this. But I kind of lost
>> track when it went into your tree, until I started thinking about it
>> earlier today. While it's not the end of the world to wait for 5.13,
>> though the current situation does suck a lot for certain workloads.
> 
> I'd be OK with sending it to Linus now, but we really should put some
> numbers behind "suck a lot" to justify it.  And a -stable backport
> might be justified if the benefit is large enough, and if "certain
> workloads" are common enough.
> 
> But with what have here, I and everyone else who considers these
> patches is going in blind!

The backstory is that an internal workload complained because it was
using too much CPU, and when I took a look, we had a lot of io_uring
workers going to town. For an async buffered read like workload, I am
normally expecting _zero_ offloads to a worker thread, but this one
had tons of them. I'd drop caches and things would look good again,
but then a minute later we'd regress back to using workers. Turns out
that every minute something was reading parts of the device, which
would add page cache for that inode. I put patches like these in for
our kernel, and the problem was solved.

Obviously that's not a great story since there are no hard numbers
there, and I'd have to generate those for you! Which I surely can.
While the original premise of the patches was fixing excessive CPU
usage, the general observation is that outside of just using a lot more
CPU, it'll also cause worse latencies as offload to a thread always adds
latency.

So let me know if you want me to generate some numbers. They will be
artificial, but at least it'll be something outside of a story.

-- 
Jens Axboe