Re: [PATCH] fs: prevent data-race due to missing inode_lock when calling vfs_getattr

Christian Brauner <brauner@xxxxxxxxxx> · Wed, 20 Nov 2024 10:19:45 +0100

On Wed, Nov 20, 2024 at 02:44:17AM +0100, Mateusz Guzik wrote:
> On Mon, Nov 18, 2024 at 07:03:30AM +0000, Al Viro wrote:
> > On Mon, Nov 18, 2024 at 03:00:39PM +0900, Jeongjun Park wrote:
> > > All the functions that added lock in this patch are called only via syscall,
> > > so in most cases there will be no noticeable performance issue.
> > 
> > Pardon me, but I am unable to follow your reasoning.
> > 
> 
> I suspect the argument is that the overhead of issuing a syscall is big
> enough that the extra cost of taking the lock trip wont be visible, but
> that's not accurate -- atomics are measurable when added to syscalls,
> even on modern CPUs.
> 
> > > And
> > > this data-race is not a problem that only occurs in theory. It is
> > > a bug that syzbot has been reporting for years. Many file systems that
> > > exist in the kernel lock inode_lock before calling vfs_getattr, so
> > > data-race does not occur, but only fs/stat.c has had a data-race
> > > for years. This alone shows that adding inode_lock to some
> > > functions is a good way to solve the problem without much 
> > > performance degradation.
> > 
> > Explain.  First of all, these are, by far, the most frequent callers
> > of vfs_getattr(); what "many filesystems" are doing around their calls
> > of the same is irrelevant.  Which filesystems, BTW?  And which call
> > chains are you talking about?  Most of the filesystems never call it
> > at all.
> > 
> > Furthermore, on a lot of userland loads stat(2) is a very hot path -
> > it is called a lot.  And the rwsem in question has a plenty of takers -
> > both shared and exclusive.  The effect of piling a lot of threads
> > that grab it shared on top of the existing mix is not something
> > I am ready to predict without experiments - not beyond "likely to be
> > unpleasant, possibly very much so".
> > 
> > Finally, you have not offered any explanations of the reasons why
> > that data race matters - and "syzbot reporting" is not one.  It is
> > possible that actual observable bugs exist, but it would be useful
> > to have at least one of those described in details.
> > 
> [snip]
> 
> On the stock kernel it is at least theoretically possible to transiently
> observe a state which is mid-update (as in not valid), but I was under
> the impression this was known and considered not a problem.

Exactly.

> 
> Nonetheless, as an example say an inode is owned by 0:0 and is being
> chowned to 1:1 and this is handled by setattr_copy.
> 
> The ids are updated one after another:
> [snip]
>         i_uid_update(idmap, attr, inode);
>         i_gid_update(idmap, attr, inode);
> [/snip]
> 
> So at least in principle it may be someone issuing getattr in parallel
> will happen to spot 1:0 (as opposed to 0:0 or 1:1), which was never set
> on the inode and is merely an artifact of hitting the timing.
> 
> This would be a bug, but I don't believe this is serious enough to
> justify taking the inode lock to get out of. 

I don't think this is a serious issue. We don't guarantee consistent
snapshots and I don't see a reason why we should complicate setattr()
for that.