On Wed, Jun 09, 2010 at 11:41:21AM +0200, Jan Kara wrote: > On Wed 09-06-10 17:33:36, Nick Piggin wrote: > > On Tue, Jun 01, 2010 at 01:39:37PM +0200, Christoph Hellwig wrote: > > > int inode_change_ok(const struct inode *inode, struct iattr *attr) > > > { > > > - int retval = -EPERM; > > > unsigned int ia_valid = attr->ia_valid; > > > > > > + /* > > > + * First check size constraints. These can't be overriden using > > > + * ATTR_FORCE. > > > + */ > > > + if (attr->ia_mode & ATTR_SIZE) { > > > + int error = inode_newsize_ok(inode, attr->ia_size); > > > + if (error) > > > + return error; > > > + } > > > > Hmm, I don't know if we can do this unless you have audited the > > filesystems (in which case they should be on the cc list of this > > pach). > > > > The problem is whether the i_size is valid and stable at this > > point. And it doesn't help even if you do leave the inode_newsize_ok > > check inside the vmtruncate part of the fs if the check incorrectly > > fails here. > > > > ocfs2 performs inode_change_ok outside ocfs2_rw_lock and > > ocfs2_inode_lock, and inode_newsize_ok inside; cifs holds i_lock > > while checking inode_newsize_ok and updating size; gfs2 inside > > gfs2_trans_begin. > That's a good point. For all local filesystems I know, holding i_mutex is > enough for having stable i_size. But for clustered filesystems it > definitely isn't. They have to hold cluster locks to be able to reliably > check current i_size (at least OCFS2 does). Looking at what > inode_newsize_ok currently does, i_size is only used to decide whether > we need to check for rlimit or not. So we could falsely miss this > check (other node is truncating the file below new offset)... Yes, or falsely disallow a shrinking truncate if it is above our rlimit. > Hmm, OK, so > we really need the cluster lock... > BTW: Mark, don't we need the cluster lock also for the permission > checks in inode_change_ok? Otherwise we could see: > Node1 Node2 > chmod("file", 000); > truncate("file", 0) > inode_change_ok still see old perms > -> success > > And Node1 and Node2 can be fully serialized via some userspace > synchronization and still hit this so it's not just a race... That's a good point too, yes. I think if the inode_change_ok check were moved inside the cluster lock, that would solve that problem and Christoph's i_size problem here. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html