Re: [PATCH v10 3/5] overlayfs: add __get xattr method

Mark Salyzyn <salyzyn@xxxxxxxxxxx> · Tue, 30 Jul 2019 09:54:09 -0700

On 7/30/19 8:55 AM, Stephen Smalley wrote:
On 7/26/19 2:30 PM, Mark Salyzyn wrote:
On 7/25/19 10:04 PM, Amir Goldstein wrote:
On Thu, Jul 25, 2019 at 7:22 PM Mark Salyzyn <salyzyn@xxxxxxxxxxx> 
wrote:
On 7/25/19 8:43 AM, Amir Goldstein wrote:
On Thu, Jul 25, 2019 at 6:03 PM Mark Salyzyn <salyzyn@xxxxxxxxxxx> 
wrote:
On 7/24/19 10:48 PM, Amir Goldstein wrote:
On Wed, Jul 24, 2019 at 10:57 PM Mark Salyzyn 
<salyzyn@xxxxxxxxxxx> wrote:
Because of the overlayfs getxattr recursion, the incoming inode 
fails
to update the selinux sid resulting in avc denials being reported
against a target context of u:object_r:unlabeled:s0.
This description is too brief for me to understand the root 
problem.
What's wring with the overlayfs getxattr recursion w.r.t the 
selinux
security model?
__vfs_getxattr (the way the security layer acquires the target sid
without recursing back to security to check the access permissions)
calls get xattr method, which in overlayfs calls vfs_getxattr on the
lower layer (which then recurses back to security to check 
permissions)
and reports back -EACCES if there was a denial (which is OK) and 
_no_
sid copied to caller's inode security data, bubbles back to the 
security
layer caller, which reports an invalid avc: message for
u:object_r:unlabeled:s0 (the uninitialized sid instead of the sid 
for
the lower filesystem target). The blocked access is 100% valid, 
it is
supposed to be blocked. This does however result in a cosmetic issue
that makes it impossible to use audit2allow to construct a rule that
would be usable to fix the access problem.

Ahhh you are talking about getting the security.selinux.* xattrs?
I was under the impression (Vivek please correct me if I wrong)
that overlayfs objects cannot have individual security labels and
They can, and we _need_ them for Android's use cases, upper and lower
filesystems.

Some (most?) union filesystems (like Android's sdcardfs) set sepolicy
from the mount options, we did not need this adjustment there of 
course.

the only way to label overlayfs objects is by mount options on the
entire mount? Or is this just for lower layer objects?

Anyway, the API I would go for is adding a @flags argument to
get() which can take XATTR_NOSECURITY akin to
FMODE_NONOTIFY, GFP_NOFS, meant to avoid recursions.
I do like it better (with the following 7 stages of grief below), best
for the future.

The change in this handler's API will affect all filesystem drivers
(well, my change affects the ABI, so it is not as-if I saved the world
from a module recompile) touching all filesystem sources with an even
larger audience of stakeholders. Larger audience of stakeholders, the
harder to get the change in ;-/. This is also concerning since I would
like this change to go to stable 4.4, 4.9, 4.14 and 4.19 where this
regression got introduced. I can either craft specific stable 
patches or
just let it go and deal with them in the android-common distributions
rather than seeking stable merged down. ABI/API breaks are a 
problem for
stable anyway ...

Use the memalloc_nofs_save/restore design pattern will avoid all that
grief.
As a matter of fact, this issue could and should be handled inside 
security
subsystem without bothering any other subsystem.
LSM have per task context right? That context could carry the recursion
flags to know that the getxattr call is made by the security 
subsystem itself.
The problem is not limited to union filesystems.
In general its a stacking issue. ecryptfs is also a stacking fs, 
out-of-tree
shiftfs as well. But it doesn't end there.
A filesystem on top of a loop device inside another filesystem could
also maybe result in security hook recursion (not sure if in practice).

Thanks,
Amir.

Good point, back to Stephen Smalley?

There are four __vfs_getxattr calls inside security, not sure I see 
any natural way to determine the recursion in security/selinux I can 
beg/borrow/steal from; but I get the strange feeling that it is 
better to detect recursion in __vfs_getxattr in this manner, and 
switch out checking in vfs_getxattr since it is localized to just 
fs/xattr.c. selinux might not be the only user of __vfs_getxattr 
nature ...

I have implemented and tested the solution where we add a flag to the 
.get method, it works. I would be tempted to submit that instead in 
case someone in the future can imagine using that flag argument to 
solve other problem(s) (if you build it, they will come).

<flips coin>

Will add a new per-process flag that __vfs_getxattr and vfs_getxattr 
plays with and see how it works and what it looks like.

As you say, SELinux is not the only user of __vfs_getxattr; in 
addition to the other security modules, there is the integrity/evm 
subsystem and ecryptfs.  Further, __vfs_getxattr does not merely skip 
LSM/SELinux-related processing; it also skips xattr_permission().  As 
such, I don't believe this is something that can be solved entirely 
within the security subsystem.

Not excited about a process flag to implicitly disable LSM/SELinux and 
other security-related processing on a code path; potential for abuse 
is high.

So you will not like my solution in "[PATCH v11 2/5] fs: __vfs_getxattr 
nesting paradigm"sent out this morning; so adding the flag option and 
widespread touching of _all_ the filesystem xattr.c/acl.c/inode.c/etc 
files to the calls is probably the easiest to stomach with the lowest 
attack surface.

Any other ideas (with less impact to tons of API/ABI/filesystems) that 
we have not thought about before I spin a v12 patch set?

-- Mark