ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes: > "Serge E. Hallyn" <serge@xxxxxxxxxx> writes: > >> Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx): >>> >>> "Serge E. Hallyn" <serge@xxxxxxxxxx> writes: >>> >>> > diff --git a/fs/xattr.c b/fs/xattr.c >>> > index 7e3317c..75cc65a 100644 >>> > --- a/fs/xattr.c >>> > +++ b/fs/xattr.c >>> > @@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, const char *name, >>> > const void *value, size_t size, int flags) >>> > { >>> > struct inode *inode = dentry->d_inode; >>> > - int error = -EAGAIN; >>> > + int error; >>> > + void *wvalue = NULL; >>> > + size_t wsize = 0; >>> > int issec = !strncmp(name, XATTR_SECURITY_PREFIX, >>> > XATTR_SECURITY_PREFIX_LEN); >>> > >>> > - if (issec) >>> > + if (issec) { >>> > inode->i_flags &= ~S_NOSEC; >>> > + >>> > + if (!strcmp(name, "security.capability")) { >>> > + error = cap_setxattr_convert_nscap(dentry, value, size, >>> > + &wvalue, &wsize); >>> > + if (error < 0) >>> > + return error; >>> > + if (wvalue) { >>> > + value = wvalue; >>> > + size = wsize; >>> > + } >>> > + } >>> > + } >>> > + >>> > + error = -EAGAIN; >>> > + >>> >>> Why is the conversion in __vfs_setxattr_noperm and not in setattr as >>> was done for posix_acl_fix_xattr_from_user? >> >> I think I was thinking I wanted to catch all the vfs_setxattr operations, >> but I don't think that's right. Moving to setxattr seems right. I'll >> look around a bit more. > > Thanks. This is one of these little details that we want a good answer > to why there. If you can document that in your patch description when > you resend I would appreciate it. Ok. Grrr. Looking at this a little more getting it correct where we call the conversion operation is critical. I believe the current placement of cap_setxattr_convert_nscap is actively wrong. In particular unless I am misleading something this will trigger multiple conversions when setting one of these attributes on overlayfs. The stragey I adopted for for posix acls is: On a write from userspace convert from current_user_ns() to &init_user_ns. On a write to the filesystem convert from &init_user_ns to fs_user_ns. On a read from the filesystem convert from fs_user_ns to &init_user_ns On a read from the kernel to userspace convert from &init_user_ns to current_user_ns(). Overall a good strategy but no one we can trivially adopt for the capability xattr as the second write to filesystem method does not appear to actually exist for anything except for posix acls. I need to think a little more about how we want to accomplish this for the capability xattr. My apoligies for leading you down a path that has all of these bumps and then being sufficiently distracted not to help you through this maze. The only easy solution I can see is to just always keep things in &init_user_ns inside the kernel. That works until we bring fuse or other unprivileged mounts onboard that have storage outside of the kernel. Seth and I will have to rework that for fuse support but that sounds better than not letting such an issue prevent us from merging the code. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html