Re: [PATCH 2/2] check ATTR_SIZE contraints in inode_change_ok

Nick Piggin <npiggin@xxxxxxx> · Wed, 9 Jun 2010 20:06:16 +1000

On Wed, Jun 09, 2010 at 11:41:21AM +0200, Jan Kara wrote:
> On Wed 09-06-10 17:33:36, Nick Piggin wrote:
> > On Tue, Jun 01, 2010 at 01:39:37PM +0200, Christoph Hellwig wrote:
> > >  int inode_change_ok(const struct inode *inode, struct iattr *attr)
> > >  {
> > > -	int retval = -EPERM;
> > >  	unsigned int ia_valid = attr->ia_valid;
> > >  
> > > +	/*
> > > +	 * First check size constraints.  These can't be overriden using
> > > +	 * ATTR_FORCE.
> > > +	 */
> > > +	if (attr->ia_mode & ATTR_SIZE) {
> > > +		int error = inode_newsize_ok(inode, attr->ia_size);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > 
> > Hmm, I don't know if we can do this unless you have audited the
> > filesystems (in which case they should be on the cc list of this
> > pach).
> > 
> > The problem is whether the i_size is valid and stable at this
> > point. And it doesn't help even if you do leave the inode_newsize_ok
> > check inside the vmtruncate part of the fs if the check incorrectly
> > fails here.
> > 
> > ocfs2 performs inode_change_ok outside ocfs2_rw_lock and
> > ocfs2_inode_lock, and inode_newsize_ok inside; cifs holds i_lock
> > while checking inode_newsize_ok and updating size; gfs2 inside
> > gfs2_trans_begin.
>   That's a good point. For all local filesystems I know, holding i_mutex is
> enough for having stable i_size. But for clustered filesystems it
> definitely isn't. They have to hold cluster locks to be able to reliably
> check current i_size (at least OCFS2 does). Looking at what
> inode_newsize_ok currently does, i_size is only used to decide whether
> we need to check for rlimit or not. So we could falsely miss this
> check (other node is truncating the file below new offset)...

Yes, or falsely disallow a shrinking truncate if it is above our
rlimit.

> Hmm, OK, so
> we really need the cluster lock...
>   BTW: Mark, don't we need the cluster lock also for the permission
> checks in inode_change_ok? Otherwise we could see:
> 	Node1				Node2
> 	chmod("file", 000);
> 					truncate("file", 0)
> 					  inode_change_ok still see old perms
> 					    -> success
> 
>   And Node1 and Node2 can be fully serialized via some userspace
> synchronization and still hit this so it's not just a race...

That's a good point too, yes. I think if the inode_change_ok check
were moved inside the cluster lock, that would solve that problem
and Christoph's i_size problem here.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html