Re: [RFC] documentation on filesystem exposure to RCU pathwalk from fs maintainers' POV

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi,

A couple of clarity-related suggestions:

On 25/02/2024 23:06, Al Viro wrote:
===================================================================
The ways in which RCU pathwalk can ruin filesystem maintainer's day
===================================================================

The problem: exposure of filesystem code to lockless environment
================================================================

Filesystem methods can usually count upon VFS-provided warranties
regarding the stability of objects they are called to act upon; at the
very least, they can expect the dentries/inodes/superblocks involved to
remain live throughout the operation.

Life would be much more painful without that; however, such warranties
do not come for free.  The problem is that access patterns are heavily
biased; every system call getting an absolute pathname will have to
start at root directory, etc.  Having each of them in effect write
"I'd been here" on the same memory objects would cost quite a bit.
As the result, we try to keep the fast path stores-free, bumping
no refcounts and taking no locks.  Details are described elsewhere,
but the bottom line for filesystems is that some methods may be called
with much looser warranties than usual.  Of course, from the filesystem

There is a slight apparent contradiction here between "at the very
least, they can expect [...] to remain live throughout the operation" in
the first paragraph (which sounds like they _do_ have these guarantees)
and most of the second paragraph (which says they _don't_ have these
guarantees).

I *think* what you are saying is that dentries/inodes/sbs involved will
indeed stay live (i.e. allocated), but that there are OTHER warranties
you might usually expect that are not there, such as objects not being
locked and potentially changing underneath your filesystem's VFS
callback or being in a partial state or other indirectly pointed-to
objects not being safe to access.

The source of the confusion for me is that you say "such warranties do
not come for free" and it sounds like it refers to the liveness warranty
that you just mentioned, whereas it actually refers to all the other
warranties that you did NOT mention yet at this point in the document.

How about changing it like this:

"""
Filesystem methods can usually count upon a number of VFS-provided
warranties regarding the stability of the dentries/inodes/superblocks
they are called to act upon. For example, they always can expect these
objects to remain live throughout the operation; life would be much more
painful without that.

However, such warranties do not come for free and other warranties may
not always be provided. [...]
"""

(As a side note, you may also want to actually link the docs we have for
RCU lookup where you say "details are described elsewhere".)

What methods are affected?
==========================

	The list of the methods that could run into that fun:

========================	==================================	=================
	method			indication that the call is unsafe	unstable objects
========================	==================================	=================

I'd wish for explicit definitions of "unsafe" (which is a terminology
you do use more or less consistently in this doc) and "unstable". The
definitions don't need mathematical precision, but there should be a
quick one-line explanation of each.

I think "the call is unsafe" means that it doesn't have all the usual
safety warranties (as detailed above).

I think "unstable" means "not locked, can change underneath the
function" (but not that it can be freed), but it would be good to have
it spelled out.

->d_hash(d, ...) 		none - any call might be		d
->d_compare(d, ...)		none - any call might be		d
->d_revalidate(d, f)		f & LOOKUP_RCU				d
->d_manage(d, f)		f					d
->permission(i, m)		m & MAY_NOT_BLOCK			i
->get_link(d, i, ...)		d == NULL				i
->get_inode_acl(i, t, f)	f == LOOKUP_RCU				i
========================	==================================	=================

FWIW, thanks for writing this document, it's a really useful addition.


Vegard




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux