Re: [PATCH 2/2] bcachefs: Buffered write path now can avoid the inode lock

Christian Brauner <brauner@xxxxxxxxxx> · Thu, 29 Feb 2024 10:46:48 +0100

On Wed, Feb 28, 2024 at 11:27:05PM -0800, Linus Torvalds wrote:
> On Wed, 28 Feb 2024 at 23:20, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> >
> >  - take the lock exclusively if O_APPEND or if it *looks* like you
> > might extend the file size.
> >
> >  - otherwise, take the shared lock, and THEN RE-CHECK. The file size
> > might have changed, so now you need to double-check that you're really
> > not going to extend the size of the file, and if you are, you need to
> > go back and take the inode lock exclusively after all.
> 
> Same goes for the suid/sgid checks. You need to take the inode lock
> shared to even have the i_mode be stable, and at that point you might
> decide "oh, I need to clear suid/sgid after all" and have to go drop
> the lock and retake it for exclusive access after all.

I agree. And this is how we've done it in xfs as well. The inode lock is
taken shared, then check for IS_NOSEC(inode) and if that's true keep the
shared lock, otherwise upgrade to an exclusive lock. Note that this
whole check also checks filesystem capability xattrs.

I think you'd also get potential security issues or at least very subtle
behavior. Someone could make a binary setuid and a concurrent write
happens. chmod is taking the exclusive lock and there's a concurrent
write going on changing the contents to something unsafe without being
forced to remove the set*id bits.

Similar for {g,u}id changes. Someone starts a chown() taking inode lock
while someone issues a concurrent write. The chown changes the group
ownership so that a writer would be forced to drop the sgid bit. But the
writer proceeds keeping that bit.

So that's all potentially pretty subtle side-effects we'd allow. And
I've removed so many bugs in that area the last couple of years as well.
So let's please use the shared lock.