Re: [PATCH] userns: Convert xfs to use kuid/kgid where appropriate

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 20 Jun 2013 11:27:04 -0400

On 06/20/2013 09:54 AM, Dwight Engen wrote:
> On Thu, 20 Jun 2013 10:13:41 +1000
> Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> 
>> On Wed, Jun 19, 2013 at 11:09:48AM -0400, Dwight Engen wrote:
>>> Use uint32 from init_user_ns for xfs internal uid/gid
>>> representation in acl, xfs_icdinode. Conversion of kuid/gid is done
>>> at the vfs boundary, other user visible xfs specific interfaces
>>> (bulkstat, eofblocks filter) expect uint32 init_user_ns uid/gid
>>> values.
>>
>> It's minimal, but I'm not sure it's complete. I'll comment on that
>> in response to Eric's comments...
>>
>>> Signed-off-by: Dwight Engen <dwight.engen@xxxxxxxxxx>
>> ....
>>> --- a/fs/xfs/xfs_fs.h
>>> +++ b/fs/xfs/xfs_fs.h
>>> @@ -347,8 +347,8 @@ typedef struct xfs_error_injection {
>>>  struct xfs_eofblocks {
>>>  	__u32		eof_version;
>>>  	__u32		eof_flags;
>>> -	uid_t		eof_uid;
>>> -	gid_t		eof_gid;
>>> +	__u32		eof_uid;
>>> +	__u32		eof_gid;
>>>  	prid_t		eof_prid;
>>>  	__u32		pad32;
>>>  	__u64		eof_min_file_size;
>>
>> The patch doesn't do namespace conversion of these uid/gids, but I'm
>> not sure it should...
> 
> The ids are only advisory, if the caller doesn't specify
> XFS_EOF_FLAGS_?ID blocks from any inode in the fs can be reclaimed
> regardless of id. Because of this, I think at a minimum we should
> change XFS_IOC_FREE_EOFBLOCKS to require capable(CAP_SYS_ADMIN), which
> somewhat implies init_user_ns based ids.
> 
> To make this really userns aware, I think we could:
>   - leave the fields as uid_t
>   - make XFS_IOC_FREE_EOFBLOCKS require nsown_capable(CAP_SYS_ADMIN)
>   - check kuid_has_mapping(current_user_ns()) for each
>     inode. This would be a change in behavior when called
>     from !init_user_ns, limiting the scope of inodes the ioctl could
>     affect.
>   - change xfs_inode_match_id() to use uid_eq(VFS_I(ip)->i_uid,
>     make_kuid(current_user_ns(), eofb->eof_uid))
> 
> Does that sound reasonable?
> 

Hi Dwight,

If I understand correctly, the proposition is to turn
XFS_EOF_FREE_EOFBLOCKS into administrator only functionality and run ns
conversions on the inode uid/gid and associated eofb values for the ID
filtering functionality.

The latter sounds reasonable to me, though I'm not so sure about the
CAP_SYS_ADMIN bit. For example, I think we'd expect a regular user to be
able to run an eofblocks scan against files covered under his quota.

Perhaps the right thing to do here is to restrict global (and project
quota?) scans to CAP_SYS_ADMIN and uid/gid based scans to processes with
the appropriate permissions (i.e., CAP_SYS_ADMIN, matching uid/gid or
CAP_FOWNER). Thoughts?

Brian

>>> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
>>> index 96e344e..70ba410 100644
>>> --- a/fs/xfs/xfs_icache.c
>>> +++ b/fs/xfs/xfs_icache.c
>>> @@ -617,7 +617,7 @@ restart:
>>>  
>>>  /*
>>>   * Background scanning to trim post-EOF preallocated space. This
>>> is queued
>>> - * based on the 'background_prealloc_discard_period' tunable (5m
>>> by default).
>>> + * based on the 'speculative_prealloc_lifetime' tunable (5m by
>>> default). */
>>>  STATIC void
>>>  xfs_queue_eofblocks(
>>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>>> index 7f7be5f..8049976 100644
>>> --- a/fs/xfs/xfs_inode.c
>>> +++ b/fs/xfs/xfs_inode.c
>>> @@ -1268,8 +1268,8 @@ xfs_ialloc(
>>>  	ip->i_d.di_onlink = 0;
>>>  	ip->i_d.di_nlink = nlink;
>>>  	ASSERT(ip->i_d.di_nlink == nlink);
>>> -	ip->i_d.di_uid = current_fsuid();
>>> -	ip->i_d.di_gid = current_fsgid();
>>> +	ip->i_d.di_uid = from_kuid(&init_user_ns, current_fsuid());
>>> +	ip->i_d.di_gid = from_kgid(&init_user_ns, current_fsgid());
>>
>> Why all new inodes be created in the init_user_ns? Shouldn't this be
>> current_user_ns()?
> 
> current_fsuid() is the kuid_t from whatever userns current is in,
> which we are converting to a flat uint32 since i_d is the on disk inode.
> This field is then used in xfs_setup_inode() to populate
> VFS_I(ip)->i_uid. Most other filesystems would use inode_init_owner(),
> but xfs does not (I assume because it wants to handle SGID itself based
> on XFS_INHERIT_GID and irix_sgid_inherit).
> 
>> Same question throughout - why do you use init_user_ns for all these
>> UID conversions, when the whole point is to have awareness of
>> different namespaces?
> 
> Yep, there are instances you point out below where we can just use
> inode->i_uid instead of converting back from the flat value, so I'll fix
> those up.
> 
>>>  	xfs_set_projid(ip, prid);
>>>  	memset(&(ip->i_d.di_pad[0]), 0, sizeof(ip->i_d.di_pad));
>>>  
>>> @@ -1308,7 +1308,7 @@ xfs_ialloc(
>>>  	 */
>>>  	if ((irix_sgid_inherit) &&
>>>  	    (ip->i_d.di_mode & S_ISGID) &&
>>> -	    (!in_group_p((gid_t)ip->i_d.di_gid))) {
>>> +	    (!in_group_p(make_kgid(&init_user_ns,
>>> ip->i_d.di_gid)))) { ip->i_d.di_mode &= ~S_ISGID;
>>>  	}
>>>  
>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>>> index 5e99968..daa6127 100644
>>> --- a/fs/xfs/xfs_ioctl.c
>>> +++ b/fs/xfs/xfs_ioctl.c
>>> @@ -981,7 +981,7 @@ xfs_ioctl_setattr(
>>>  	 * to the file owner ID, except in cases where the
>>>  	 * CAP_FSETID capability is applicable.
>>>  	 */
>>> -	if (current_fsuid() != ip->i_d.di_uid
>>> && !capable(CAP_FOWNER)) {
>>> +	if (!inode_owner_or_capable(&ip->i_vnode)) {
>>
>> 				    VFS_I(ip)
>>
>>>  		code = XFS_ERROR(EPERM);
>>>  		goto error_return;
>>>  	}
>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>> index ca9ecaa..bf96cf8 100644
>>> --- a/fs/xfs/xfs_iops.c
>>> +++ b/fs/xfs/xfs_iops.c
>>> @@ -420,8 +420,8 @@ xfs_vn_getattr(
>>>  	stat->dev = inode->i_sb->s_dev;
>>>  	stat->mode = ip->i_d.di_mode;
>>>  	stat->nlink = ip->i_d.di_nlink;
>>> -	stat->uid = ip->i_d.di_uid;
>>> -	stat->gid = ip->i_d.di_gid;
>>> +	stat->uid = make_kuid(&init_user_ns, ip->i_d.di_uid);
>>> +	stat->gid = make_kgid(&init_user_ns, ip->i_d.di_gid);
>>
>> Why not:
>>
>> 	stat->uid = inode->i_uid;
>> 	stat->gid = inode->i_gid;
>>
>>>  	stat->ino = ip->i_ino;
>>>  	stat->atime = inode->i_atime;
>>>  	stat->mtime = inode->i_mtime;
>>> @@ -488,8 +488,8 @@ xfs_setattr_nonsize(
>>>  	int			mask = iattr->ia_valid;
>>>  	xfs_trans_t		*tp;
>>>  	int			error;
>>> -	uid_t			uid = 0, iuid = 0;
>>> -	gid_t			gid = 0, igid = 0;
>>> +	kuid_t			uid = GLOBAL_ROOT_UID, iuid
>>> = GLOBAL_ROOT_UID;
>>> +	kgid_t			gid = GLOBAL_ROOT_GID, igid
>>> = GLOBAL_ROOT_GID; struct xfs_dquot	*udqp = NULL, *gdqp =
>>> NULL; struct xfs_dquot	*olddquot1 = NULL, *olddquot2 = NULL;
>>>  
>>> @@ -522,13 +522,13 @@ xfs_setattr_nonsize(
>>>  			uid = iattr->ia_uid;
>>>  			qflags |= XFS_QMOPT_UQUOTA;
>>>  		} else {
>>> -			uid = ip->i_d.di_uid;
>>> +			uid = make_kuid(&init_user_ns,
>>> ip->i_d.di_uid);
>>
>> 			uid = VFS_I(ip)->i_uid;
>>>  		}
>>>  		if ((mask & ATTR_GID) && XFS_IS_GQUOTA_ON(mp)) {
>>>  			gid = iattr->ia_gid;
>>>  			qflags |= XFS_QMOPT_GQUOTA;
>>>  		}  else {
>>> -			gid = ip->i_d.di_gid;
>>> +			gid = make_kgid(&init_user_ns,
>>> ip->i_d.di_gid);
>>
>> 			gid = VFS_I(ip)->i_gid;
>>>  		}
>> .....
>>> @@ -561,8 +563,8 @@ xfs_setattr_nonsize(
>>>  		 * while we didn't have the inode locked, inode's
>>> dquot(s)
>>>  		 * would have changed also.
>>>  		 */
>>> -		iuid = ip->i_d.di_uid;
>>> -		igid = ip->i_d.di_gid;
>>> +		iuid = make_kuid(&init_user_ns, ip->i_d.di_uid);
>>> +		igid = make_kgid(&init_user_ns, ip->i_d.di_gid);
>>>  		gid = (mask & ATTR_GID) ? iattr->ia_gid : igid;
>>>  		uid = (mask & ATTR_UID) ? iattr->ia_uid : iuid;
>>
>> Same again - you can just use VFS_I(ip)->i_uid/VFS_I(ip)->i_gid
>> here.
>>
>>> @@ -1172,8 +1174,8 @@ xfs_setup_inode(
>>>  
>>>  	inode->i_mode	= ip->i_d.di_mode;
>>>  	set_nlink(inode, ip->i_d.di_nlink);
>>> -	inode->i_uid	= ip->i_d.di_uid;
>>> -	inode->i_gid	= ip->i_d.di_gid;
>>> +	inode->i_uid	= make_kuid(&init_user_ns,
>>> ip->i_d.di_uid);
>>> +	inode->i_gid	= make_kgid(&init_user_ns,
>>> ip->i_d.di_gid);
>>
>> current name space?
> 
> I believe that is what this is doing, but I think it will be more
> proper to do it the same as other filesystems:
> 
> i_uid_write(inode, ip->i_d.di_uid);
> i_gid_write(inode, ip->i_d.di_gid);
> 
>>>  
>>>  	switch (inode->i_mode & S_IFMT) {
>>>  	case S_IFBLK:
>>> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
>>> index b75c9bb..94a2a8f 100644
>>> --- a/fs/xfs/xfs_qm.c
>>> +++ b/fs/xfs/xfs_qm.c
>>> @@ -1651,8 +1651,8 @@ xfs_qm_write_sb_changes(
>>>  int
>>>  xfs_qm_vop_dqalloc(
>>>  	struct xfs_inode	*ip,
>>> -	uid_t			uid,
>>> -	gid_t			gid,
>>> +	__uint32_t		di_uid,
>>> +	__uint32_t		di_gid,
>>
>> xfs_dqid_t
>>
>> And there's no need to rename the variables - that just causes
>> unnecessary churn, and the fact it is a dquot ID is documented by
>> the type.
> 
> Yep using the type will be clearer, thanks.
> 
>> Cheers,
>>
>> Dave.
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs