Re: [PATCH] vfs: avoid spurious dentry ref/unref cycle on open

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Wed, 7 Aug 2024 06:32:32 +0100

On Wed, Aug 07, 2024 at 05:57:07AM +0200, Mateusz Guzik wrote:

[there'll be a separate reply with what I hope might be a usable
approach]

> Yes, this is my understanding of the code and part of my compliant. :)
> 
> Things just work(tm) as is with NULLified pointers, but this is error-prone.

And carrying the arseloads of information (which ones do and which do not
need to be dropped) is *less* error-prone?  Are you serious?

> As a hypothetical suppose there is code executing some time after
> vfs_open which looks at nd->path.dentry and by finding the pointer is
> NULL it concludes the lookup did not work out.
> 
> If such code exists *and* the pointer is poisoned in the above sense
> (notably merely branching on it with kasan already traps), then the
> consumer will be caught immediately during coverage testing by
> syzkaller.

You are much too optimistic about the quality of test coverage in this
particular area.

> If such code exists but the pointer is only nullified, one is only
> going to find out the hard way when some functionality weirdly breaks.

To do _useful_ asserts, one needs invariants to check.  And "we got
to this check after having passed through that assignment at some
earlier point" is not it.  That's why I'm asking questions about
the state.

The thing is, suppose I (or you, or somebody else) is trying to modify
the whole thing.  There's a magical mystery assert in the way; what
should be done with it?  Move it/split it/remove it/do something
random and hope syzkaller won't catch anything?  If I can reason
about the predicate being checked, I can at least start figuring out
what should be done.  If not, it's bloody guaranteed to rot.

This particular area (pathwalk machinery) has a nasty history of
growing complexity once in a while, with following cleanups and
massage to get it back into more or less tolerable shape.
And refactoring that had been _painful_ - I'd done more than
a few there.

As far as I can tell, at the moment this flag (and yes, I've seen its
removal in the next version) is "we'd called vfs_open_consume() at
some point, then found ourselves still in RCU mode or we'd called
vfs_open_consume() more than once".

This is *NOT* a property of state; it's a property of execution
history.  The first part is checked in the wrong place - one of
the invariants (trivially verified by code examination) is that
LOOKUP_RCU is never regained after it had been dropped.  The
only place where it can be set is path_init() and calling _that_
between path_init() and terminate_walk() would be
	a) a hard and very visible bug
	b) would've wiped your flag anyway.
So that part of the check is basically "we are not calling
vfs_open_consume() under rcu_read_lock()".  Which is definitely
a desirable property, since ->open() can block.  So can
mnt_want_write() several lines prior.  Invariant here is
"the places where we set FMODE_OPENED or FMODE_CREATED may
not have LOOKUP_RCU".  Having
        if (!(file->f_mode & (FMODE_OPENED | FMODE_CREATED))) {
		error = complete_walk(nd);
		if (error)
			return error;
	}
in the beginning of do_open() guarantees that for vfs_open()
call there.  All other places where that can happen are in
lookup_open() or called from it (via ->atomic_open() to
finish_open()).  And *that* definitely should not be done
in RCU mode, due to
        if (open_flag & O_CREAT)
                inode_lock(dir->d_inode);
        else
                inode_lock_shared(dir->d_inode);
        dentry = lookup_open(nd, file, op, got_write);
in the sole caller of that thing.  Again, can't grab a blocking
lock under rcu_read_lock().  Which is why we have this
                if (WARN_ON_ONCE(nd->flags & LOOKUP_RCU))
                        return ERR_PTR(-ECHILD);
        } else {
                /* create side of things */
                if (nd->flags & LOOKUP_RCU) {
                        if (!try_to_unlazy(nd))
                                return ERR_PTR(-ECHILD);
                }
slightly prior to that call.  WARN_ON_ONCE is basically "lookup_fast()
has returned NULL and stayed in RCU mode", which should never happen.
try_to_unlazy() is straight "either switch to non-RCU mode or return an
error" - that's what this function is for.  No WARN_ON after that - it
would only obfuscate things.

*IF* you want to add debugging checks for that kind of stuff, just call
that assert_nonrcu(nd), make it check and whine and feel free to slap
them in reasonable amount of places (anything that makes a reader go
"for fuck sake, hadn't we (a) done that on the entry to this function
and (b) done IO since then, anyway?" is obviously not reasonable, etc. -
no more than common sense limitations).

Another common sense thing: extra asserts won't confuse syzkaller, but
they very much can confuse a human reader.  And any rewrites are done
by humans...

As for the double call of vfs_open_consume()...  You do realize that
the damn thing wouldn't have reached that check if it would ever have
cause to be triggered, right?  Seeing that we call
static inline struct mnt_idmap *mnt_idmap(const struct vfsmount *mnt)
{
        /* Pairs with smp_store_release() in do_idmap_mount(). */
        return smp_load_acquire(&mnt->mnt_idmap);
}
near the beginning of do_open(), ~20 lines before the place where
you added that check...

I'm not sure it makes sense to defend against a weird loop appearing
out of nowhere near the top of call chain, but if you want to do that,
this is not the right place for that.