Re: [PATCH 00/17] VFS: Filesystem information and notifications [ver #17]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/3/20 12:02 PM, Jeff Layton wrote:
> On Tue, 2020-03-03 at 09:55 -0700, Jens Axboe wrote:
>> On 3/3/20 9:51 AM, Jeff Layton wrote:
>>> On Tue, 2020-03-03 at 08:44 -0700, Jens Axboe wrote:
>>>> On 3/3/20 7:24 AM, Greg Kroah-Hartman wrote:
>>>>> On Tue, Mar 03, 2020 at 03:13:26PM +0100, Jann Horn wrote:
>>>>>> On Tue, Mar 3, 2020 at 3:10 PM Greg Kroah-Hartman
>>>>>> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>>>>>>> On Tue, Mar 03, 2020 at 02:43:16PM +0100, Greg Kroah-Hartman wrote:
>>>>>>>> On Tue, Mar 03, 2020 at 02:34:42PM +0100, Miklos Szeredi wrote:
>>>>>>>>> On Tue, Mar 3, 2020 at 2:14 PM Greg Kroah-Hartman
>>>>>>>>> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>>>> Unlimited beers for a 21-line kernel patch?  Sign me up!
>>>>>>>>>>>
>>>>>>>>>>> Totally untested, barely compiled patch below.
>>>>>>>>>>
>>>>>>>>>> Ok, that didn't even build, let me try this for real now...
>>>>>>>>>
>>>>>>>>> Some comments on the interface:
>>>>>>>>
>>>>>>>> Ok, hey, let's do this proper :)
>>>>>>>
>>>>>>> Alright, how about this patch.
>>>>>>>
>>>>>>> Actually tested with some simple sysfs files.
>>>>>>>
>>>>>>> If people don't strongly object, I'll add "real" tests to it, hook it up
>>>>>>> to all arches, write a manpage, and all the fun fluff a new syscall
>>>>>>> deserves and submit it "for real".
>>>>>>
>>>>>> Just FYI, io_uring is moving towards the same kind of thing... IIRC
>>>>>> you can already use it to batch a bunch of open() calls, then batch a
>>>>>> bunch of read() calls on all the new fds and close them at the same
>>>>>> time. And I think they're planning to add support for doing
>>>>>> open()+read()+close() all in one go, too, except that it's a bit
>>>>>> complicated because passing forward the file descriptor in a generic
>>>>>> way is a bit complicated.
>>>>>
>>>>> It is complicated, I wouldn't recommend using io_ring for reading a
>>>>> bunch of procfs or sysfs files, that feels like a ton of overkill with
>>>>> too much setup/teardown to make it worth while.
>>>>>
>>>>> But maybe not, will have to watch and see how it goes.
>>>>
>>>> It really isn't, and I too thinks it makes more sense than having a
>>>> system call just for the explicit purpose of open/read/close. As Jann
>>>> said, you can't currently do a linked sequence of open/read/close,
>>>> because the fd passing between them isn't done. But that will come in
>>>> the future. If the use case is "a bunch of files", then you could
>>>> trivially do "open bunch", "read bunch", "close bunch" in three separate
>>>> steps.
>>>>
>>>> Curious what the use case is for this that warrants a special system
>>>> call?
>>>>
>>>
>>> Agreed. I'd really rather see something more general-purpose than the
>>> proposed readfile(). At least with NFS and SMB, you can compound
>>> together fairly arbitrary sorts of operations, and it'd be nice to be
>>> able to pattern calls into the kernel for those sorts of uses.
>>>
>>> So, NFSv4 has the concept of a current_stateid that is maintained by the
>>> server. So basically you can do all this (e.g.) in a single compound:
>>>
>>> open <some filehandle get a stateid>
>>> write <using that stateid>
>>> close <same stateid>
>>>
>>> It'd be nice to be able to do something similar with io_uring. Make it
>>> so that when you do an open, you set the "current fd" inside the
>>> kernel's context, and then be able to issue io_uring requests that
>>> specify a magic "fd" value that use it.
>>>
>>> That would be a really useful pattern.
>>
>> For io_uring, you can link requests that you submit into a chain. Each
>> link in the chain is done in sequence. Which means that you could do:
>>
>> <open some file><read from that file><close that file>
>>
>> in a single sequence. The only thing that is missing right now is a way
>> to have the return of that open propagated to the 'fd' of the read and
>> close, and it's actually one of the topics to discuss at LSFMM next
>> month.
>>
>> One approach would be to use BPF to handle this passing, another
>> suggestion has been to have the read/close specify some magic 'fd' value
>> that just means "inherit fd from result of previous". The latter sounds
>> very close to the stateid you mention above, and the upside here is that
>> it wouldn't explode the necessary toolchain to need to include BPF.
>>
>> In other words, this is really close to being reality and practically
>> feasible.
>>
> 
> Excellent.
> 
> Yes, the latter is exactly what I had in mind for this. I suspect that
> that would cover a large fraction of the potential use-cases for this.
> 
> Basically, all you'd need to do is keep a pointer to struct file in the
> internal state for the chain. Then, allow userland to specify some magic
> fd value for subsequent chained operations that says to use that instead
> of consulting the fdtable. Maybe use -4096 (-MAX_ERRNO - 1)?

Yeah I think that'd be a suitable way to signal that.

> That would cover the smb or nfs server sort of use cases, I think. For
> the sysfs cases, I guess you'd need to dispatch several chains, but that
> doesn't sound _too_ onerous.

The magic fd would be per-chain, so doing multiple chains wouldn't
really matter at all.

Let me try and hack this up, should be pretty trivial.

> In fact, with that you should even be able to emulate the proposed
> readlink syscall in a userland library.

Exactly

-- 
Jens Axboe




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux