Re: [RFC] capabilities: Ambient capabilities

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm unclear why you refer to the inheritable set in this test:

+               } else {
+                       if (arg2 == PR_CAP_AMBIENT_RAISE &&
+                           (!cap_raised(current_cred()->cap_permitted, arg3) ||
+                            !cap_raised(current_cred()->cap_inheritable,
+                                        arg3)))
+                               return -EPERM;

I'm also unclear how you can turn off this new 'feature' for a process
tree? As it is, the code creates an exploit path for a capable (pP !=
0) program with an exploitable flaw to create a privilege escalation
for an arbitrary child program. While I understand that everyone
'knows what they are doing' in implementing this change, I'm convinced
that folk that are up to no good also do... Why not provide a lockable
secure bit to selectively disable this support?

Nacked-by: Andrew G. Morgan <morgan@xxxxxxxxxx>

Cheers

Andrew


On Thu, Mar 12, 2015 at 2:49 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Thu, Mar 12, 2015 at 11:08 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>> Credit where credit is due: this idea comes from Christoph Lameter
>> with a lot of valuable input from Serge Hallyn.  This patch is
>> heavily based on Christoph's patch.
>>
>> ===== The status quo =====
>>
>> On Linux, there are a number of capabilities defined by the kernel.
>> To perform various privileged tasks, processes can wield
>> capabilities that they hold.
>>
>> Each task has four capability masks: effective (pE), permitted (pP),
>> inheritable (pI), and a bounding set (X).  When the kernel checks
>> for a capability, it checks pE.  The other capability masks serve to
>> modify what capabilities can be in pE.
>>
>> Any task can remove capabilities from pE, pP, or pI at any time.  If
>> a task has a capability in pP, it can add that capability to pE
>> and/or pI.  If a task has CAP_SETPCAP, then it can add any
>> capability to pI, and it can remove capabilities from X.
>>
>> Tasks are not the only things that can have capabilities; files can
>> also have capabilities.  A file can have no capabilty information at
>> all [1].  If a file has capability information, then it has a
>> permitted mask (fP) and an inheritable mask (fI) as well as a single
>> effective bit (fE) [2].  File capabilities modify the capabilities
>> of tasks that execve(2) them.
>>
>> A task that successfully calls execve has its capabilities modified
>> for the file ultimately being excecuted (i.e. the binary itself if
>> that binary is ELF or for the interpreter if the binary is a
>> script.) [3] In the capability evolution rules, for each mask Z, pZ
>> represents the old value and pZ' represents the new value.  The
>> rules are:
>>
>>   pP' = (X & fP) | (pI & fI)
>>   pI' = pI
>>   pE' = (fE ? pP' : 0)
>>   X is unchanged
>>
>> For setuid binaries, fP, fI, and fE are modified by a moderately
>> complicated set of rules that emulate POSIX behavior.  Similarly, if
>> euid == 0 or ruid == 0, then fP, fI, and fE are modified differently
>> (primary, fP and fI usually end up being the full set).  For nonroot
>> users executing binaries with neither setuid nor file caps, fI and
>> fP are empty and fE is false.
>>
>> As an extra complication, if you execute a process as nonroot and fE
>> is set, then the "secure exec" rules are in effect: AT_SECURE gets
>> set, LD_PRELOAD doesn't work, etc.
>>
>> This is rather messy.  We've learned that making any changes is
>> dangerous, though: if a new kernel version allows an unprivileged
>> program to change its security state in a way that persists cross
>> execution of a setuid program or a program with file caps, this
>> persistent state is surprisingly likely to allow setuid or
>> file-capped programs to be exploited for privilege escalation.
>>
>> ===== The problem =====
>>
>> Capability inheritance is basically useless.
>>
>> If you aren't root and you execute an ordinary binary, fI is zero,
>> so your capabilities have no effect whatsoever on pP'.  This means
>> that you can't usefully execute a helper process or a shell command
>> with elevated capabilities if you aren't root.
>>
>> On current kernels, you can sort of work around this by setting fI
>> to the full set for most or all non-setuid executable files.  This
>> causes pP' = pI for nonroot, and inheritance works.  No one does
>> this because it's a PITA and it isn't even supported on most
>> filesystems.
>>
>> If you try this, you'll discover that every nonroot program ends up
>> with secure exec rules, breaking many things.
>>
>> This is a problem that has bitten many people who have tried to use
>> capabilities for anything useful.
>>
>> ===== The proposed change =====
>>
>> This patch adds a fifth capability mask called the ambient mask
>> (pA).  pA does what pI should have done.
>>
>> pA obeys the invariant that no bit can ever be set in pA if it is
>> not set in both pP and pI.  Dropping a bit from pP or pI drops that
>> bit from pA.  This ensures that existing programs that try to drop
>> capabilities still do so, with a complication.  Because capability
>> inheritance is so broken, setting KEEPCAPS, using setresuid to
>> switch to nonroot uids, and calling execve effectively drops
>> capabilities.  Therefore, setresuid from root to nonroot
>> unconditionally clears pA.  Processes that don't like this can
>> re-add bits to pA afterwards.
>>
>> The capability evolution rules are changed:
>>
>>   pA' = (file caps or setuid or setgid ? 0 : pA)
>>   pP' = (X & fP) | (pI & fI) | pA'
>>   pI' = pI
>>   pE' = (fE ? pP' : pA')
>>   X is unchanged
>>
>> If you are nonroot but you have a capability, you can add it to pA.
>> If you do so, your children get that capability in pA, pP, and pE.
>> For example, you can set pA = CAP_NET_BIND_SERVICE, and your
>> children can automatically bind low-numbered ports.  Hallelujah!
>>
>> Unprivileged users can create user namespaces, map themselves to a
>> nonzero uid, and create both privileged (relative to their
>> namespace) and unprivileged process trees.  This is currently more
>> or less impossible.  Hallelujah!
>>
>> You cannot use pA to try to subvert a setuid, setgid, or file-capped
>> program: if you execute any such program, pA gets cleared and the
>> resulting evolution rules are unchanged by this patch.
>>
>> Users with nonzero pA are unlikely to unintentionally leak that
>> capability.  If they run programs that try to drop privileges,
>> dropping privileges will still work.
>>
>> It's worth noting that the degree of paranoia in this patch could
>> possibly be relaxed without causing serious problems.  Specifically,
>> if we allowed pA to persist across executing non-pA-aware setuid
>> binaries and across setresuid, then, naively, the only capabilities
>> that could leak as a result would be the capabilities in pA, and any
>> attacker *already* has those capabilities.  This would make me
>> nervous, though -- setuid binaries that tried to privilege-separate
>> might fail to do so, and putting CAP_DAC_READ_SEARCH or
>> CAP_DAC_OVERRIDE into pA could have unexpected side effects.
>> (Whether these unexpected side effects would be exploitable is an
>> open question.)  I've therefore taken the more paranoid route.
>>
>> An alternative would be to either require PR_SET_NO_NEW_PRIVS before
>> setting ambient capabilities.  I think that this would be annoying
>> and would make granting otherwise unprivileged users minor ambient
>> capabilities (CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much
>> less useful than it is with this patch.
>>
>> ===== Footnotes =====
>>
>> [1] Files that are missing the "security.capability" xattr or that
>> have unrecognized values for that xattr end up with has_cap ==
>> false.  The code that does that appears to be complicated for no
>> good reason.
>>
>> [2] The libcap capability mask parsers and formatters are
>> dangerously misleading and the documentation is flat-out wrong.  fE
>> is *not* a mask; it's a single bit.  This has probably confused
>> every single person who has tried to use file capabilities.
>>
>> [3] Linux very confusingly processes the script and the interpreter if
>> applicable, for reasons that escape me.  The results from thinking
>> about a script's file capabilities and/or setuid bits are mostly discarded.
>>
>> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
>> Cc: Christoph Lameter <cl@xxxxxxxxx>
>> Cc: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx>
>> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
>> Cc: Jonathan Corbet <corbet@xxxxxxx>
>> Cc: Aaron Jones <aaronmdjones@xxxxxxxxx>
>> CC: Ted Ts'o <tytso@xxxxxxx>
>> Cc: linux-security-module@xxxxxxxxxxxxxxx
>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>> Cc: linux-api@xxxxxxxxxxxxxxx
>> Cc: akpm@xxxxxxxxxxxxxxxxxxx
>> Cc: Andrew G. Morgan <morgan@xxxxxxxxxx>
>> Cc: Mimi Zohar <zohar@xxxxxxxxxxxxxxxxxx>
>> Cc: Austin S Hemmelgarn <ahferroin7@xxxxxxxxx>
>> Cc: Markku Savela <msa@xxxxxxxxxxx>
>> Cc: Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx>
>> Cc: Michael Kerrisk <mtk.manpages@xxxxxxxxx>
>> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
>
> This would be quite welcome for things we're doing in Chrome OS.
> Presently, we're able to use fscaps to keep non-root caps across exec
> and haven't encountered issues with AT_SECURE (yet), but using pA
> would be much nicer and exactly matches how we want to use it: a
> launcher is creating a tree of processes that are non-root but need
> some capabilities. Right now the tree is very small and we're able to
> sprinkle our fscaps lightly. :) This would be better.
>
> -Kees
>
>> ---
>>
>> Preliminary userspace code is here:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/luto/util-linux-playground.git/commit/?h=cap_ambient&id=860c73ac1acaaae976bdd3bb83b89b0180f0702a
>>
>> fs/proc/array.c              |  5 ++-
>>  include/linux/cred.h         | 15 +++++++++
>>  include/uapi/linux/prctl.h   |  6 ++++
>>  kernel/user_namespace.c      |  1 +
>>  security/commoncap.c         | 75 ++++++++++++++++++++++++++++++++++++++------
>>  security/keys/process_keys.c |  1 +
>>  6 files changed, 92 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/proc/array.c b/fs/proc/array.c
>> index 1295a00ca316..bc15356d6551 100644
>> --- a/fs/proc/array.c
>> +++ b/fs/proc/array.c
>> @@ -282,7 +282,8 @@ static void render_cap_t(struct seq_file *m, const char *header,
>>  static inline void task_cap(struct seq_file *m, struct task_struct *p)
>>  {
>>         const struct cred *cred;
>> -       kernel_cap_t cap_inheritable, cap_permitted, cap_effective, cap_bset;
>> +       kernel_cap_t cap_inheritable, cap_permitted, cap_effective,
>> +                       cap_bset, cap_ambient;
>>
>>         rcu_read_lock();
>>         cred = __task_cred(p);
>> @@ -290,12 +291,14 @@ static inline void task_cap(struct seq_file *m, struct task_struct *p)
>>         cap_permitted   = cred->cap_permitted;
>>         cap_effective   = cred->cap_effective;
>>         cap_bset        = cred->cap_bset;
>> +       cap_ambient     = cred->cap_ambient;
>>         rcu_read_unlock();
>>
>>         render_cap_t(m, "CapInh:\t", &cap_inheritable);
>>         render_cap_t(m, "CapPrm:\t", &cap_permitted);
>>         render_cap_t(m, "CapEff:\t", &cap_effective);
>>         render_cap_t(m, "CapBnd:\t", &cap_bset);
>> +       render_cap_t(m, "CapAmb:\t", &cap_ambient);
>>  }
>>
>>  static inline void task_seccomp(struct seq_file *m, struct task_struct *p)
>> diff --git a/include/linux/cred.h b/include/linux/cred.h
>> index 2fb2ca2127ed..a21bcba6ef84 100644
>> --- a/include/linux/cred.h
>> +++ b/include/linux/cred.h
>> @@ -122,6 +122,7 @@ struct cred {
>>         kernel_cap_t    cap_permitted;  /* caps we're permitted */
>>         kernel_cap_t    cap_effective;  /* caps we can actually use */
>>         kernel_cap_t    cap_bset;       /* capability bounding set */
>> +       kernel_cap_t    cap_ambient;    /* Ambient capability set */
>>  #ifdef CONFIG_KEYS
>>         unsigned char   jit_keyring;    /* default keyring to attach requested
>>                                          * keys to */
>> @@ -197,6 +198,20 @@ static inline void validate_process_creds(void)
>>  }
>>  #endif
>>
>> +static inline void cap_enforce_ambient_invariants(struct cred *cred)
>> +{
>> +       cred->cap_ambient = cap_intersect(cred->cap_ambient,
>> +                                         cap_intersect(cred->cap_permitted,
>> +                                                       cred->cap_inheritable));
>> +}
>> +
>> +static inline bool cap_ambient_invariant_ok(const struct cred *cred)
>> +{
>> +       return cap_issubset(cred->cap_ambient,
>> +                           cap_intersect(cred->cap_permitted,
>> +                                         cred->cap_inheritable));
>> +}
>> +
>>  /**
>>   * get_new_cred - Get a reference on a new set of credentials
>>   * @cred: The new credentials to reference
>> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
>> index 31891d9535e2..65407f867e82 100644
>> --- a/include/uapi/linux/prctl.h
>> +++ b/include/uapi/linux/prctl.h
>> @@ -190,4 +190,10 @@ struct prctl_mm_map {
>>  # define PR_FP_MODE_FR         (1 << 0)        /* 64b FP registers */
>>  # define PR_FP_MODE_FRE                (1 << 1)        /* 32b compatibility */
>>
>> +/* Control the ambient capability set */
>> +#define PR_CAP_AMBIENT         47
>> +# define PR_CAP_AMBIENT_GET    1
>> +# define PR_CAP_AMBIENT_RAISE  2
>> +# define PR_CAP_AMBIENT_LOWER  3
>> +
>>  #endif /* _LINUX_PRCTL_H */
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index 4109f8320684..dab0f808235a 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -39,6 +39,7 @@ static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns)
>>         cred->cap_inheritable = CAP_EMPTY_SET;
>>         cred->cap_permitted = CAP_FULL_SET;
>>         cred->cap_effective = CAP_FULL_SET;
>> +       cred->cap_ambient = CAP_EMPTY_SET;
>>         cred->cap_bset = CAP_FULL_SET;
>>  #ifdef CONFIG_KEYS
>>         key_put(cred->request_key_auth);
>> diff --git a/security/commoncap.c b/security/commoncap.c
>> index f66713bd7450..b3253886ecad 100644
>> --- a/security/commoncap.c
>> +++ b/security/commoncap.c
>> @@ -272,6 +272,7 @@ int cap_capset(struct cred *new,
>>         new->cap_effective   = *effective;
>>         new->cap_inheritable = *inheritable;
>>         new->cap_permitted   = *permitted;
>> +       cap_enforce_ambient_invariants(new);
>>         return 0;
>>  }
>>
>> @@ -352,6 +353,7 @@ static inline int bprm_caps_from_vfs_caps(struct cpu_vfs_cap_data *caps,
>>
>>                 /*
>>                  * pP' = (X & fP) | (pI & fI)
>> +                * The addition of pA' is handled later.
>>                  */
>>                 new->cap_permitted.cap[i] =
>>                         (new->cap_bset.cap[i] & permitted) |
>> @@ -479,10 +481,12 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
>>  {
>>         const struct cred *old = current_cred();
>>         struct cred *new = bprm->cred;
>> -       bool effective, has_cap = false;
>> +       bool effective, has_cap = false, is_setid;
>>         int ret;
>>         kuid_t root_uid;
>>
>> +       BUG_ON(!cap_ambient_invariant_ok(old));
>> +
>>         effective = false;
>>         ret = get_file_caps(bprm, &effective, &has_cap);
>>         if (ret < 0)
>> @@ -527,8 +531,9 @@ skip:
>>          *
>>          * In addition, if NO_NEW_PRIVS, then ensure we get no new privs.
>>          */
>> -       if ((!uid_eq(new->euid, old->uid) ||
>> -            !gid_eq(new->egid, old->gid) ||
>> +       is_setid = !uid_eq(new->euid, old->uid) || !gid_eq(new->egid, old->gid);
>> +
>> +       if ((is_setid ||
>>              !cap_issubset(new->cap_permitted, old->cap_permitted)) &&
>>             bprm->unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
>>                 /* downgrade; they get no more than they had, and maybe less */
>> @@ -544,10 +549,23 @@ skip:
>>         new->suid = new->fsuid = new->euid;
>>         new->sgid = new->fsgid = new->egid;
>>
>> +       /* File caps or setid cancel ambient. */
>> +       if (has_cap || is_setid)
>> +               cap_clear(new->cap_ambient);
>> +
>> +       /*
>> +        * Now that we've computed pA', update pP' to give:
>> +        *   pP' = (X & fP) | (pI & fI) | pA'
>> +        */
>> +       new->cap_permitted = cap_combine(new->cap_permitted, new->cap_ambient);
>> +
>>         if (effective)
>>                 new->cap_effective = new->cap_permitted;
>>         else
>> -               cap_clear(new->cap_effective);
>> +               new->cap_effective = new->cap_ambient;
>> +
>> +       BUG_ON(!cap_ambient_invariant_ok(new));
>> +
>>         bprm->cap_effective = effective;
>>
>>         /*
>> @@ -562,7 +580,7 @@ skip:
>>          * Number 1 above might fail if you don't have a full bset, but I think
>>          * that is interesting information to audit.
>>          */
>> -       if (!cap_isclear(new->cap_effective)) {
>> +       if (!cap_issubset(new->cap_effective, new->cap_ambient)) {
>>                 if (!cap_issubset(CAP_FULL_SET, new->cap_effective) ||
>>                     !uid_eq(new->euid, root_uid) || !uid_eq(new->uid, root_uid) ||
>>                     issecure(SECURE_NOROOT)) {
>> @@ -573,6 +591,9 @@ skip:
>>         }
>>
>>         new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
>> +
>> +       BUG_ON(!cap_ambient_invariant_ok(new));
>> +
>>         return 0;
>>  }
>>
>> @@ -594,7 +615,7 @@ int cap_bprm_secureexec(struct linux_binprm *bprm)
>>         if (!uid_eq(cred->uid, root_uid)) {
>>                 if (bprm->cap_effective)
>>                         return 1;
>> -               if (!cap_isclear(cred->cap_permitted))
>> +               if (!cap_issubset(cred->cap_permitted, cred->cap_ambient))
>>                         return 1;
>>         }
>>
>> @@ -696,10 +717,18 @@ static inline void cap_emulate_setxuid(struct cred *new, const struct cred *old)
>>              uid_eq(old->suid, root_uid)) &&
>>             (!uid_eq(new->uid, root_uid) &&
>>              !uid_eq(new->euid, root_uid) &&
>> -            !uid_eq(new->suid, root_uid)) &&
>> -           !issecure(SECURE_KEEP_CAPS)) {
>> -               cap_clear(new->cap_permitted);
>> -               cap_clear(new->cap_effective);
>> +            !uid_eq(new->suid, root_uid))) {
>> +               if (!issecure(SECURE_KEEP_CAPS)) {
>> +                       cap_clear(new->cap_permitted);
>> +                       cap_clear(new->cap_effective);
>> +               }
>> +
>> +               /*
>> +                * Pre-ambient programs except setresuid to nonroot followed
>> +                * by exec to drop capabilities.  We should make sure that
>> +                * this remains the case.
>> +                */
>> +               cap_clear(new->cap_ambient);
>>         }
>>         if (uid_eq(old->euid, root_uid) && !uid_eq(new->euid, root_uid))
>>                 cap_clear(new->cap_effective);
>> @@ -929,6 +958,32 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
>>                         new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS);
>>                 return commit_creds(new);
>>
>> +       case PR_CAP_AMBIENT:
>> +               if (((!cap_valid(arg3)) | arg4 | arg5))
>> +                       return -EINVAL;
>> +
>> +               if (arg2 == PR_CAP_AMBIENT_GET) {
>> +                       return !!cap_raised(current_cred()->cap_ambient, arg3);
>> +               } else if (arg2 != PR_CAP_AMBIENT_RAISE &&
>> +                          arg2 != PR_CAP_AMBIENT_LOWER) {
>> +                       return -EINVAL;
>> +               } else {
>> +                       if (arg2 == PR_CAP_AMBIENT_RAISE &&
>> +                           (!cap_raised(current_cred()->cap_permitted, arg3) ||
>> +                            !cap_raised(current_cred()->cap_inheritable,
>> +                                        arg3)))
>> +                               return -EPERM;
>> +
>> +                       new = prepare_creds();
>> +                       if (!new)
>> +                               return -ENOMEM;
>> +                       if (arg2 == PR_CAP_AMBIENT_RAISE)
>> +                               cap_raise(new->cap_ambient, arg3);
>> +                       else
>> +                               cap_lower(new->cap_ambient, arg3);
>> +                       return commit_creds(new);
>> +               }
>> +
>>         default:
>>                 /* No functionality available - continue with default */
>>                 return -ENOSYS;
>> diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
>> index bd536cb221e2..43b4cddbf2b3 100644
>> --- a/security/keys/process_keys.c
>> +++ b/security/keys/process_keys.c
>> @@ -848,6 +848,7 @@ void key_change_session_keyring(struct callback_head *twork)
>>         new->cap_inheritable    = old->cap_inheritable;
>>         new->cap_permitted      = old->cap_permitted;
>>         new->cap_effective      = old->cap_effective;
>> +       new->cap_ambient        = old->cap_ambient;
>>         new->cap_bset           = old->cap_bset;
>>
>>         new->jit_keyring        = old->jit_keyring;
>> --
>> 2.3.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>
> --
> Kees Cook
> Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux