On Wed, Nov 20, 2024 at 02:44:17AM +0100, Mateusz Guzik wrote: > On Mon, Nov 18, 2024 at 07:03:30AM +0000, Al Viro wrote: > > On Mon, Nov 18, 2024 at 03:00:39PM +0900, Jeongjun Park wrote: > > > All the functions that added lock in this patch are called only via syscall, > > > so in most cases there will be no noticeable performance issue. > > > > Pardon me, but I am unable to follow your reasoning. > > > > I suspect the argument is that the overhead of issuing a syscall is big > enough that the extra cost of taking the lock trip wont be visible, but > that's not accurate -- atomics are measurable when added to syscalls, > even on modern CPUs. > > > > And > > > this data-race is not a problem that only occurs in theory. It is > > > a bug that syzbot has been reporting for years. Many file systems that > > > exist in the kernel lock inode_lock before calling vfs_getattr, so > > > data-race does not occur, but only fs/stat.c has had a data-race > > > for years. This alone shows that adding inode_lock to some > > > functions is a good way to solve the problem without much > > > performance degradation. > > > > Explain. First of all, these are, by far, the most frequent callers > > of vfs_getattr(); what "many filesystems" are doing around their calls > > of the same is irrelevant. Which filesystems, BTW? And which call > > chains are you talking about? Most of the filesystems never call it > > at all. > > > > Furthermore, on a lot of userland loads stat(2) is a very hot path - > > it is called a lot. And the rwsem in question has a plenty of takers - > > both shared and exclusive. The effect of piling a lot of threads > > that grab it shared on top of the existing mix is not something > > I am ready to predict without experiments - not beyond "likely to be > > unpleasant, possibly very much so". > > > > Finally, you have not offered any explanations of the reasons why > > that data race matters - and "syzbot reporting" is not one. It is > > possible that actual observable bugs exist, but it would be useful > > to have at least one of those described in details. > > > [snip] > > On the stock kernel it is at least theoretically possible to transiently > observe a state which is mid-update (as in not valid), but I was under > the impression this was known and considered not a problem. Exactly. > > Nonetheless, as an example say an inode is owned by 0:0 and is being > chowned to 1:1 and this is handled by setattr_copy. > > The ids are updated one after another: > [snip] > i_uid_update(idmap, attr, inode); > i_gid_update(idmap, attr, inode); > [/snip] > > So at least in principle it may be someone issuing getattr in parallel > will happen to spot 1:0 (as opposed to 0:0 or 1:1), which was never set > on the inode and is merely an artifact of hitting the timing. > > This would be a bug, but I don't believe this is serious enough to > justify taking the inode lock to get out of. I don't think this is a serious issue. We don't guarantee consistent snapshots and I don't see a reason why we should complicate setattr() for that.