Re: [PATCH 2/2] bcachefs: Buffered write path now can avoid the inode lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 29, 2024 at 10:46:48AM +0100, Christian Brauner wrote:
> On Wed, Feb 28, 2024 at 11:27:05PM -0800, Linus Torvalds wrote:
> > On Wed, 28 Feb 2024 at 23:20, Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > >
> > >  - take the lock exclusively if O_APPEND or if it *looks* like you
> > > might extend the file size.
> > >
> > >  - otherwise, take the shared lock, and THEN RE-CHECK. The file size
> > > might have changed, so now you need to double-check that you're really
> > > not going to extend the size of the file, and if you are, you need to
> > > go back and take the inode lock exclusively after all.
> > 
> > Same goes for the suid/sgid checks. You need to take the inode lock
> > shared to even have the i_mode be stable, and at that point you might
> > decide "oh, I need to clear suid/sgid after all" and have to go drop
> > the lock and retake it for exclusive access after all.
> 
> I agree. And this is how we've done it in xfs as well. The inode lock is
> taken shared, then check for IS_NOSEC(inode) and if that's true keep the
> shared lock, otherwise upgrade to an exclusive lock. Note that this
> whole check also checks filesystem capability xattrs.
> 
> I think you'd also get potential security issues or at least very subtle
> behavior. Someone could make a binary setuid and a concurrent write
> happens. chmod is taking the exclusive lock and there's a concurrent
> write going on changing the contents to something unsafe without being
> forced to remove the set*id bits.
> 
> Similar for {g,u}id changes. Someone starts a chown() taking inode lock
> while someone issues a concurrent write. The chown changes the group
> ownership so that a writer would be forced to drop the sgid bit. But the
> writer proceeds keeping that bit.

That's the kind of race that's not actually a race - if the syscalls
being invoked at the same time, it's fine if the result is as if the
write happened first, then the chown.

For there to be an actual race one of two things would have to be the
case:
 - the chown/chmod path would have to be dependent on the on the data
   the write path is changing, or
 - there'd have to be a problem wnith the write path seeing inconsistent
   state mid modification

But I can't claim 100% the second case isn't a factor, I don't know the
security side of things as well, and the XFS way sounds nice and clean.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux