Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/20/18 5:23 PM, Andrew Morton wrote:
> On Wed, 20 Jun 2018 16:28:54 -0600 Jens Axboe <axboe@xxxxxxxxx> wrote:
> 
>>>> Nothing prevents it, but it's always been the case. So we need to
>>>> embrace the fact one way or another.
>>>
>>> All it will take is one simple (and desirable) cleanup to f2fs and this
>>> will all break.  Take that as an indication of the fragility of this
>>> assumption.
>>
>> The assumption has been true for a decade or more. The only problem with
>> it is that we have this ->readpages() interface that is only used for
>> readahead. Hence it should be ->readahead() or similar, that avoid the
>> fragility argument.
> 
> There's no reason why readpages shouldn't be used for non-readahead
> purposes.

I feel like we're going in circles here! Yes, I agree, fact is it isn't
and never has been. So either we acknowledge that ->readpages() is is
just read-ahead by nature, or we rename it, or in some other way
recognize that that is the case. See above.

>>>>> This is kinda dirty, but we could add a readahead flag to blk_plug, set
>>>>> it in read_pages(), test it down in the block layer somewhere....
>>>>> That's grubby, but at least it wouldn't increase sizeof(task_struct)?
>>>>
>>>> That sounds way too dirty for my liking, I'd really not want to mix that
>>>> in with plugging.
>>>
>>> That's the wrong way of thinking about it ;) Rename blk_plug to
>>> blk_state or something and view it as a communication package between
>>> the VFS level and the block layer and it all fits together pretty well.
>>
>> We have a perfectly fine established interface for telling the block
>> layer what kind of IO we're submitting, and that's setting it in the
>> bio. Introducing a secondary and orthogonal interface for this is a
>> horrible idea, imho.
> 
> ->readpages() callers don't have access to the bio(s).

It most certainly does, since it's the one submitting the IO. That's how
I'm currently propagating the read vs reada information.

>>>> Why not just treat it like it is - readpages() is clearly just
>>>> read-ahead. This isn't something new I'm inventing, it's always been so.
>>>> Seems to me that we just need to make it explicit.
>>>
>>> There's no reason at all why ->readpages() is exclusively for use by
>>> readahead!  I'm rather surprised that fsf2 is the only fs which is
>>> (should be) using ->readpages internally.
>>
>> I totally agree, there's no reason why it is, and everybody is surprised
>> that that is the case. But the fact of the matter is that that is the
>> case, and always has been so. So I'll repeat what I said before, we need
>> to embrace that and make it EXPLICIT instead of side stepping the issue.
> 
> My point is that f2fs (for example) *should* be using mpage_readpages()
> right now, for its non-readahead purposes.

I do hear that point. But it's somewhat of a fallacy. I'm perfectly
happy adding ->readahead() and having all the rest of the ->readpages()
be that, and then f2fs can use ->readpages(). At the same time, I don't
want to do that if there's no current users. For all I know and care,
f2fs will stay how it is forever. We can't base decision on how to
proceed on whether or not some random file system will decide to do
something else in the future.

> If we are to remove callers' ability to use readpages for non-readahead
> purposes we should rename the address_space field.

Totally agree, and I'd be happy to do that. So how about we do that? I
rename it to ->readahead (or ->readaheadpages()?), and then it's
perfectly clear what is going on. If f2fs wants to use ->readpages() for
non-readhead, then we can add ->readpages(). Honestly, it's
disappointing that we don't have any non-reada ->readpages.

>>> Can we change blktrace instead?  Rename REQ_RAHEAD to REQ_READPAGES? 
>>> Then everything will fit together OK.
>>
>> Why do you insist on working around it?! That's just creating another
>> assumption in another place, not fixing the actual issue.
> 
> All I've been told about is blktrace.

That was my cover.

>> Besides, this isn't going to be just about tracing. Yes, it'll be
>> awesome to actually get the right information from blktrace, since right
>> now nobody knows which parts are read-ahead and which ones are explicit
>> reads. Might be pretty darn useful to debug read-ahead issues.
>>
>> The read-ahead information must be reliable, otherwise it's impossible
>> to introduce other dependencies on top of that. We're having a lot of
>> issues with termination of tasks that are stuck in issuing read-ahead.
>> If we can rely on the read-ahead information being there AND being
>> correct, then we can terminate reads early. This might not sound like
>> huge win, but we're talking tens of minutes for some cases. That's the
>> planned functional change behind this, but can't be done before we make
>> progress on the topic at hand.
> 
> Changelog material right there.

I deliberately didn't want to include that, since it'd muddy the waters
on what is a 100% standalone change.

-- 
Jens Axboe




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux