Re: [RFC] documentation on filesystem exposure to RCU pathwalk from fs maintainers' POV

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 26, 2024 at 07:35:17AM +0100, Vegard Nossum wrote:

> There is a slight apparent contradiction here between "at the very
> least, they can expect [...] to remain live throughout the operation" in
> the first paragraph (which sounds like they _do_ have these guarantees)
> and most of the second paragraph (which says they _don't_ have these
> guarantees).
> 
> I *think* what you are saying is that dentries/inodes/sbs involved will
> indeed stay live (i.e. allocated), but that there are OTHER warranties
> you might usually expect that are not there, such as objects not being
> locked and potentially changing underneath your filesystem's VFS
> callback or being in a partial state or other indirectly pointed-to
> objects not being safe to access.

Live != memory object hasn't been freed yet.  It's a lot stronger than
that.  And most of the filesystem methods get those stronger warranties;
life would be very hard if we did not have those.

E.g. when you are in the middle of ->read(), you know that struct file
passed to you won't reach ->release() until after your ->read() returns,
that the filesystem it's on hasn't even started to be shut down, that
its in-core inode won't get to ->evict_inode(), that its dentry is
still associated with the same inode and will stay that way until you are
done, etc.

Normally we do get that kind of warranties - caller holds references
to the objects we are asked to operate upon.  However, the fast
path of pathname resolution (everything's in the VFS caches, no IO
or blocking operations needed, etc.) is an exception.  Several filesystem
methods (the ones involved in the fast path) may be called with
the warranties that are weaker than what they (and the rest of the
methods) normally get.  Note that e.g.  ->lookup() does not need to worry -
it's off the fast path pretty much by definition and VFS switches to
pinning objects before calling anything of that sort.

"Unsafe call" refers to the method calls made by RCU pathwalk with
weaker warranties.  Part of the objects passed to those might have
already started on the way through their destructors.

> Filesystem methods can usually count upon a number of VFS-provided
> warranties regarding the stability of the dentries/inodes/superblocks
> they are called to act upon. For example, they always can expect these
> objects to remain live throughout the operation; life would be much more
> painful without that.
> 
> However, such warranties do not come for free and other warranties may
> not always be provided. [...]
> """

Maybe...

> (As a side note, you may also want to actually link the docs we have for
> RCU lookup where you say "details are described elsewhere".)
> 
> > What methods are affected?
> > ==========================
> > 
> > 	The list of the methods that could run into that fun:
> > 
> > ========================	==================================	=================
> > 	method			indication that the call is unsafe	unstable objects
> > ========================	==================================	=================
> 
> I'd wish for explicit definitions of "unsafe" (which is a terminology
> you do use more or less consistently in this doc) and "unstable". The
> definitions don't need mathematical precision, but there should be a
> quick one-line explanation of each.
 
See above.

> I think "the call is unsafe" means that it doesn't have all the usual
> safety warranties (as detailed above).
> 
> I think "unstable" means "not locked, can change underneath the
> function" (but not that it can be freed), but it would be good to have
> it spelled out.

Nope.  "Locked" is not an issue.  "Might be hit by a destructor called by
another thread right under your nose" is.  It's _that_ unpleasant.  Fortunately,
most of the nastiness is on the VFS side, but there's a good reason why
quite a few filesystems simply bail out and tell VFS to piss off and not
come back without having grabbed the references, so that nothing of that
sort would have to be dealt with.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux