Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead

Jens Axboe <axboe@xxxxxxxxx> · Wed, 20 Jun 2018 16:28:54 -0600

On 6/20/18 1:58 PM, Andrew Morton wrote:
> On Wed, 20 Jun 2018 08:07:06 -0600 Jens Axboe <axboe@xxxxxxxxx> wrote:
> 
>> On 6/19/18 5:56 PM, Andrew Morton wrote:
>>> On Wed, 30 May 2018 08:42:05 -0600 Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>
>>>> The only caller of ->readpages() is from read-ahead, yet we don't
>>>> submit IO flagged with REQ_RAHEAD. This means we don't see it in
>>>> blktrace, for instance, which is a shame. We already make assumptions
>>>> about ->readpages() just being for read-ahead in the mpage
>>>> implementation, using readahead_gfp_mask(mapping) as out GFP mask of
>>>> choice.
>>>>
>>>> This small series fixes up mpage_readpages() to submit with
>>>> REQ_RAHEAD, which takes care of file systems using mpage_readpages().
>>>> The first patch is a prep patch, that makes do_mpage_readpage() take
>>>> an argument structure.
>>>
>>> Well gee.  That sounds like a total fluke and I don't see anything
>>> which prevents future code from using readpages for something other
>>> than readahead.
>>
>> Nothing prevents it, but it's always been the case. So we need to
>> embrace the fact one way or another.
> 
> All it will take is one simple (and desirable) cleanup to f2fs and this
> will all break.  Take that as an indication of the fragility of this
> assumption.

The assumption has been true for a decade or more. The only problem with
it is that we have this ->readpages() interface that is only used for
readahead. Hence it should be ->readahead() or similar, that avoid the
fragility argument.

>>> And what's happening with the individual ->readpage() calls which
>>> read_pages() does?  Do they still not get REQ_RAHEAD treatment?
>>
>> They should, yes.
> 
> hm, how does that happen.

Maybe we aren't talking about the same thing. Which individual
->readpage() exactly?

>>> This is kinda dirty, but we could add a readahead flag to blk_plug, set
>>> it in read_pages(), test it down in the block layer somewhere....
>>> That's grubby, but at least it wouldn't increase sizeof(task_struct)?
>>
>> That sounds way too dirty for my liking, I'd really not want to mix that
>> in with plugging.
> 
> That's the wrong way of thinking about it ;) Rename blk_plug to
> blk_state or something and view it as a communication package between
> the VFS level and the block layer and it all fits together pretty well.

We have a perfectly fine established interface for telling the block
layer what kind of IO we're submitting, and that's setting it in the
bio. Introducing a secondary and orthogonal interface for this is a
horrible idea, imho.

>> Why not just treat it like it is - readpages() is clearly just
>> read-ahead. This isn't something new I'm inventing, it's always been so.
>> Seems to me that we just need to make it explicit.
> 
> There's no reason at all why ->readpages() is exclusively for use by
> readahead!  I'm rather surprised that fsf2 is the only fs which is
> (should be) using ->readpages internally.

I totally agree, there's no reason why it is, and everybody is surprised
that that is the case. But the fact of the matter is that that is the
case, and always has been so. So I'll repeat what I said before, we need
to embrace that and make it EXPLICIT instead of side stepping the issue.

> Can we change blktrace instead?  Rename REQ_RAHEAD to REQ_READPAGES? 
> Then everything will fit together OK.

Why do you insist on working around it?! That's just creating another
assumption in another place, not fixing the actual issue.

Besides, this isn't going to be just about tracing. Yes, it'll be
awesome to actually get the right information from blktrace, since right
now nobody knows which parts are read-ahead and which ones are explicit
reads. Might be pretty darn useful to debug read-ahead issues.

The read-ahead information must be reliable, otherwise it's impossible
to introduce other dependencies on top of that. We're having a lot of
issues with termination of tasks that are stuck in issuing read-ahead.
If we can rely on the read-ahead information being there AND being
correct, then we can terminate reads early. This might not sound like
huge win, but we're talking tens of minutes for some cases. That's the
planned functional change behind this, but can't be done before we make
progress on the topic at hand.

-- 
Jens Axboe