Re: Commit 13c164b1a186 - regression for LSMs/SELinux?

Ian Kent <raven@xxxxxxxxxx> · Fri, 25 Sep 2020 11:37:43 +0800

On Thu, 2020-09-24 at 10:16 -0400, Stephen Smalley wrote:
> On Thu, Sep 24, 2020 at 4:36 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx>
> wrote:
> > On Wed, Sep 23, 2020 at 3:55 AM Ian Kent <raven@xxxxxxxxxx> wrote:
> > > On Tue, 2020-09-22 at 09:33 +0200, Ondrej Mosnacek wrote:
> > > > On Mon, Sep 21, 2020 at 6:30 PM Al Viro <
> > > > viro@xxxxxxxxxxxxxxxxxx>
> > > > wrote:
> > > > > On Mon, Sep 21, 2020 at 06:09:22PM +0200, Christoph Hellwig
> > > > > wrote:
> > > > > > [adding Linus and Al]
> > > > > > 
> > > > > > On Mon, Sep 21, 2020 at 04:51:35PM +0200, Ondrej Mosnacek
> > > > > > wrote:
> > > > > > > Hi folks,
> > > > > > > 
> > > > > > > It seems that after commit 13c164b1a186 ("autofs: switch
> > > > > > > to
> > > > > > > kernel_write") there is now an extra LSM permission
> > > > > > > required
> > > > > > > (for the
> > > > > > > current task to write to the automount pipe) for
> > > > > > > processes
> > > > > > > accessing
> > > > > > > some yet-to-to-be mounted directory on which an autofs
> > > > > > > mount is
> > > > > > > set
> > > > > > > up. The call chain is:
> > > > > > > [...]
> > > > > > > autofs_wait() ->
> > > > > > > autofs_notify_daemon() ->
> > > > > > > autofs_write() ->
> > > > > > > kernel_write() ->
> > > > > > > rw_verify_area() ->
> > > > > > > security_file_permission()
> > > > > > > 
> > > > > > > The bug report that led me to this commit is at [1].
> > > > > > > 
> > > > > > > Technically, this is a regression for LSM users, since
> > > > > > > this is
> > > > > > > a
> > > > > > > kernel-internal operation and an LSM permission for the
> > > > > > > current
> > > > > > > task
> > > > > > > shouldn't be required. Can this patch be reverted?
> > > > > > > Perhaps
> > > > > > > __kernel_{read|write}() could instead be renamed to
> > > > > > > kernel_*_nocheck()
> > > > > > > so that the name is more descriptive?
> > > > > > 
> > > > > > So we obviously should not break existing user space and
> > > > > > need to
> > > > > > fix
> > > > > > this ASAP.  The trivial "fix" would be to export
> > > > > > __kernel_write
> > > > > > again
> > > > > > and switch autofs to use it.  The other option would be a
> > > > > > FMODE
> > > > > > flag
> > > > > > to bypass security checks, only to be set if the callers
> > > > > > ensures
> > > > > > they've been valided (i.e. in autofs_prepare_pipe).
> > > > 
> > > > IMHO that sounds like an overkill in this scenario. I don't
> > > > think it
> > > > makes sense to do the LSM check here (or at least not against
> > > > the
> > > > current task's creds), because it is not the current task that
> > > > wants
> > > > to communicate with the daemon, it just wants to to access some
> > > > directory on the system that just happens to be special to the
> > > > kernel,
> > > > which needs to do some communication on the side to service
> > > > this
> > > > request. So if we do want to do any LSM check here, there
> > > > should at
> > > > least be some "bool internal" flag passed to the LSM,
> > > > signalizing
> > > > that
> > > > this is an internal read/write operation that wasn't directly
> > > > initiated/requested by the current process. SELinux could then
> > > > either
> > > > use the kernel secid instead of the current task's secid or
> > > > skip the
> > > > check completely in such case.
> > > 
> > > Perhaps, but see below.
> > > 
> > > > I'd like Stephen to weigh in on this, but it looks he might be
> > > > on
> > > > vacation right now...
> > > > 
> > > > > > Any opinions?
> > > > > 
> > > > > Reexport for now.  Incidentally, what is LSM doing rejecting
> > > > > writes
> > > > > into a pipe?
> > > > 
> > > > With SELinux at least, what is allowed or denied is defined in
> > > > the
> > > > policy. And the policy usually defaults to everything denied
> > > > and then
> > > > you add rules to allow what needs (and makes sense) to be
> > > > allowed.
> > > > Since until kernel 5.8 random processes didn't need to write to
> > > > pipes
> > > > created by the automount daemon, it has never been explicitly
> > > > allowed
> > > > and so the automounting now fails. It is in no way obvious that
> > > > all
> > > > processes should have the permission to talk to the automount
> > > > daemon
> > > > just to traverse the filesystem...
> > > 
> > > I think you might have misunderstood what lead to this, just a
> > > bit.
> > > 
> > > Previously the __kern_write() function was used for this
> > > communication
> > > and Christoph's patch changed that to use kern_write() instead.
> > > 
> > > In theory that's a good idea because kern_write() adds some
> > > additional
> > > sanity checks, one being a call to rw_verify_area() which is
> > > where the
> > > security_file_permission() call fails.
> > > 
> > > So previously any random process could avoid these checks by
> > > calling
> > > __kern_write() so the change to kern_write() is, in theory,
> > > that's a
> > > good thing and simply reverting that hunk in Christoph's patch
> > > probably isn't the best thing to do.
> > 
> > I understand that and I'm not proposing the revert as a long-term
> > fix.
> > For a long-term solution I propose using kernel_write() and
> > extending
> > it to allow the caller to suppress (just) the
> > security_file_permission() call. Then each caller would have to
> > decide
> > whether the LSM check makes sense in that situation or not. It
> > seems
> > safer against future mistakes than leaving it up to the caller to
> > call
> > security_file_permission() explicitly (I predict that no new user
> > would even realize that the call might be needed).
> > 
> > > But any random process does need to be able to write to the
> > > automount
> > > daemon pipe for trailing path components and the root dentry of
> > > autofs
> > > mounts, depending on case.
> > > 
> > > So it's true that any write to any autofs dentry probably doesn't
> > > need to be allowed but I question what that gets us in terms of
> > > security improvement over allowing pipe writes for automount_t
> > > labelled pipes in selinux policy since they must be within an
> > > autofs
> > > mounted file system.
> > 
> > The difference is not in security, but in usability. The kernel
> > communicating with the autofs daemon is an internal detail that
> > shouldn't need special rules in policy. Even if we want to do any
> > LSM
> > checking here, the subject should be kernel_t, not the current
> > running
> > process. The process doesn't have any control on whether the kernel
> > does the communication and it doesn't control the content of the
> > communication, so the permission check as it is doesn't make any
> > sense. People writing the policy should be burdened by low-level
> > details about how the kernel works internally as little as
> > possible.
> > 
> > > But Stephen has a different recommendation (and that appears to
> > > consider the cause I outlined above) so I'll wait to see what
> > > others
> > > think about the recommendations.
> > 
> > As I said above, I think Stephen's approach is less future-proof.
> > And
> > it seems that rw_verify_area() has many other callers, most/all of
> > which probably service actual requests from userspace and we'd need
> > to
> > retain the security_file_permission() call in those cases.
> > 
> > Let me try to put my proposal into a patch, so we have something
> > concrete to talk about...
> 
> Up-thread I thought Linus indicated he didn't really want a flag to
> disable pemission checking due to potential abuse (and I agree).
> Historically we have taken one of two approaches for these
> situations:
> 1) Provide a separate interface like kernel_write() for use when we
> don't want permission checking and don't have it call the security
> hook at all.  If you prefer kernel_write_nosec() that's fine but I
> think that's somewhat implicit in the fact that it is a
> kernel-initiated write, not a userspace write (which would hopefully
> go through vfs_write() or similar and end up calling the hook).
> 2) Temporarily override creds to the init_cred or the result of
> prepare_kernel_creds() before calling any credential-checking
> functions and then revert creds afterward.
> 
> The problem with #2 is that it still requires that the policy allow
> kernel_t (or its equivalent) to be able to write to these pipes,
> which
> wasn't previously necessary and thus might not be allowed in all
> policies (e.g. Android?).  #1 avoids any need for policy for these
> operations.

There is a third option, that's actually to revert only the autofs
change and forgo the couple of extra checks we get from kern_write().

My recommendation of keeping the change and documenting the behaviour
via security policy includes the implicit assumption that there are
no other cases of this.

If that wasn't the case then changing the kernel functions or (likely
better option of) simply using __kern_write() in those cases would be
the better choice IMHO.

Ian