Hi, A couple of clarity-related suggestions: On 25/02/2024 23:06, Al Viro wrote:
=================================================================== The ways in which RCU pathwalk can ruin filesystem maintainer's day =================================================================== The problem: exposure of filesystem code to lockless environment ================================================================ Filesystem methods can usually count upon VFS-provided warranties regarding the stability of objects they are called to act upon; at the very least, they can expect the dentries/inodes/superblocks involved to remain live throughout the operation. Life would be much more painful without that; however, such warranties do not come for free. The problem is that access patterns are heavily biased; every system call getting an absolute pathname will have to start at root directory, etc. Having each of them in effect write "I'd been here" on the same memory objects would cost quite a bit. As the result, we try to keep the fast path stores-free, bumping no refcounts and taking no locks. Details are described elsewhere, but the bottom line for filesystems is that some methods may be called with much looser warranties than usual. Of course, from the filesystem
There is a slight apparent contradiction here between "at the very least, they can expect [...] to remain live throughout the operation" in the first paragraph (which sounds like they _do_ have these guarantees) and most of the second paragraph (which says they _don't_ have these guarantees). I *think* what you are saying is that dentries/inodes/sbs involved will indeed stay live (i.e. allocated), but that there are OTHER warranties you might usually expect that are not there, such as objects not being locked and potentially changing underneath your filesystem's VFS callback or being in a partial state or other indirectly pointed-to objects not being safe to access. The source of the confusion for me is that you say "such warranties do not come for free" and it sounds like it refers to the liveness warranty that you just mentioned, whereas it actually refers to all the other warranties that you did NOT mention yet at this point in the document. How about changing it like this: """ Filesystem methods can usually count upon a number of VFS-provided warranties regarding the stability of the dentries/inodes/superblocks they are called to act upon. For example, they always can expect these objects to remain live throughout the operation; life would be much more painful without that. However, such warranties do not come for free and other warranties may not always be provided. [...] """ (As a side note, you may also want to actually link the docs we have for RCU lookup where you say "details are described elsewhere".)
What methods are affected? ========================== The list of the methods that could run into that fun: ======================== ================================== ================= method indication that the call is unsafe unstable objects ======================== ================================== =================
I'd wish for explicit definitions of "unsafe" (which is a terminology you do use more or less consistently in this doc) and "unstable". The definitions don't need mathematical precision, but there should be a quick one-line explanation of each. I think "the call is unsafe" means that it doesn't have all the usual safety warranties (as detailed above). I think "unstable" means "not locked, can change underneath the function" (but not that it can be freed), but it would be good to have it spelled out.
->d_hash(d, ...) none - any call might be d ->d_compare(d, ...) none - any call might be d ->d_revalidate(d, f) f & LOOKUP_RCU d ->d_manage(d, f) f d ->permission(i, m) m & MAY_NOT_BLOCK i ->get_link(d, i, ...) d == NULL i ->get_inode_acl(i, t, f) f == LOOKUP_RCU i ======================== ================================== =================
FWIW, thanks for writing this document, it's a really useful addition. Vegard