On Wed, Feb 28, 2024 at 11:27:05PM -0800, Linus Torvalds wrote: > On Wed, 28 Feb 2024 at 23:20, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > - take the lock exclusively if O_APPEND or if it *looks* like you > > might extend the file size. > > > > - otherwise, take the shared lock, and THEN RE-CHECK. The file size > > might have changed, so now you need to double-check that you're really > > not going to extend the size of the file, and if you are, you need to > > go back and take the inode lock exclusively after all. > > Same goes for the suid/sgid checks. You need to take the inode lock > shared to even have the i_mode be stable, and at that point you might > decide "oh, I need to clear suid/sgid after all" and have to go drop > the lock and retake it for exclusive access after all. I agree. And this is how we've done it in xfs as well. The inode lock is taken shared, then check for IS_NOSEC(inode) and if that's true keep the shared lock, otherwise upgrade to an exclusive lock. Note that this whole check also checks filesystem capability xattrs. I think you'd also get potential security issues or at least very subtle behavior. Someone could make a binary setuid and a concurrent write happens. chmod is taking the exclusive lock and there's a concurrent write going on changing the contents to something unsafe without being forced to remove the set*id bits. Similar for {g,u}id changes. Someone starts a chown() taking inode lock while someone issues a concurrent write. The chown changes the group ownership so that a writer would be forced to drop the sgid bit. But the writer proceeds keeping that bit. So that's all potentially pretty subtle side-effects we'd allow. And I've removed so many bugs in that area the last couple of years as well. So let's please use the shared lock.