On Tue, Oct 15, 2013 at 01:19:05PM -0700, Christoph Hellwig wrote: > On Sat, Oct 05, 2013 at 01:19:18PM +1000, Dave Chinner wrote: > > Yup, we don't hold the i_mutex *at all* through the fast path for > > direct IO writes. Having to grab the i_mutex on every IO just for > > the extremely unlikely case we need to remove a suid bit on the file > > would add a significant serialisation point into the direct Io model > > that XFS uses, and is the difference between 50,000 and 2+ million > > direct IO IOPS to a single file. > > > > I'm unwilling to sacrifice the concurrency of direct IO writes just > > to shut up ths warning, especially as the actual modifications that > > are made to remove SUID bits are correctly serialised within XFS > > once notify_change() calls ->setattr(). If it really matters, I'll > > just open code file_remove_suid() into XFS like ocfs2 does just so > > we don't get that warning being emitted by trinity. > > But the i_lock doesn't synchronize against the VFS modifying various > struct inode fields. Sure, but file_remove_suid() doesn't actually modify any VFS inode structures until we process the flags and the modifications within ->setattr, which in XFS are all done under the XFS_ILOCK_EXCL via xfs_setattr_mode(). i.e. both the VFS and XFS inodes S*ID bits are removed only under XFS_ILOCK_EXCL.... Hence I see no point in adding extra serialisation via the i_mutex to this path when we can just do something like: killsuid = should_remove_suid(file->f_path.dentry); if (killsuid) { struct iattr newattr; newattr.ia_valid = ATTR_FORCE | killsuid; error = xfs_setattr_nonsize(ip, &newattr, 0); if (error) return error; } and not require the i_mutex at all... Indeed, this is exactly what do_truncate() does - the check outside the i_mutex, then calls notify_change() with the i_mutex held. IOWs, the i_mutex does nothing to serialise concurrent attempts to check and remove S*ID bits.... > The right fix is to take i_mutex just in case > we actually need to remove the suid bit. The patch below should fix it, > although I need to write a testcase that actually exercises it first. > > Dave (J.): if you have time to try the patch below please go ahead, > if not I'll make sure to write an isolated test ASAP to verify it and > will then submit the change. > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 4c749ab..e879f96 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -590,8 +590,22 @@ restart: > * If we're writing the file then make sure to clear the setuid and > * setgid bits if the process is not being run by root. This keeps > * people from modifying setuid and setgid binaries. > + * > + * Note that file_remove_suid must be called with the i_mutex held, > + * so we have to go through some hoops here to make sure we hold it. > */ > - return file_remove_suid(file); > + if (!IS_NOSEC(inode) && should_remove_suid(file->f_path.dentry)) { > + if (*iolock == XFS_IOLOCK_SHARED) { > + mutex_lock(&inode->i_mutex); > + error = file_remove_suid(file); > + mutex_unlock(&inode->i_mutex); Lock inversion - i_mutex is always outside i_iolock. i.e. this will deadlock if someone else calls xfs_rw_ilock(XFS_ILOCK_EXCL) at the same time because we already hold the i_iolock in shared mode. It's the same case that this function already handles for the EOF zeroing relocking. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs