Re: [BUG RFD]selinux: sidtab_context_to_sid return -NOMEM when concurrent with security_load_policy

Stephen Smalley <stephen.smalley@xxxxxxxxx> · Thu, 28 Dec 2017 09:57:58 -0500

On Sat, Dec 23, 2017 at 3:03 AM, Li Kun <hw.likun@xxxxxxxxxx> wrote:

    Hi all,

    When i start a docker container, the runc will call
    selinux_setprocattr to set the exec_sid before start the container.

    Meanwhile if i use "semodule -i" to load a policy pp, the old sidtab
    will be shutdown before switch to the new sidtab, and cause 

    sidtab_context_to_sid  failed with the errno -NOMEM.

    Error message:

    Dec 14 09:18:18 wuwenming_vudfua_0 docker:
    time="2017-12-14T09:18:18.549862564+08:00" level=error msg="Handler
    for GET
    /v1.23/containers/192.168.0.2:9082/dfs_d:V100R013C11SPC000BCD2/json
    returned error: No such container:
    192.168.0.2:9082/dfs_d:V100R013C11SPC000BCD2"

    Dec 14 09:18:18 wuwenming_vudfua_0 kernel: [   48.463179] SELinux:
security_context_to_sid(system_u:object_r:svirt_sandbox_file_t:s0:c290,c371)
    failed for (dev overlay, type overlay) errno=-12

    Dec 14 09:18:18 wuwenming_vudfua_0 dbus-daemon: dbus[720]: avc: 
    received policyload notice (seqno=2)

    Dec 14 09:18:18 wuwenming_vudfua_0 dbus[720]: avc:  received
    policyload notice (seqno=2)

runc                                                                                                                           
            semodule

->selinux_setprocattr                                                                                                          
    ->security_load_policy

        ->security_context_to_sid                                   

    ->sidtab_shutdown(oldsidtab)

            ->read_lock(&policy_rwlock);                       

        ->sidtab_map(&oldsidtab, clone_sid, &newsidtab);

            ->sidtab_context_to_sid                               

                {

                            ....

                            if (s->next_sid == UINT_MAX || s->shutdown) {

                            ret = -ENOMEM;

                            ....

                }

            ->read_unlock(&policy_rwlock);                       

            ->write_lock_irq(&policy_rwlock);

              ->"switch to new policydb"

            ->write_unlock_irq(&policy_rwlock);

    I wonder that if this is the intention or a bug?

    If it is the intention, what should the application do when it get
    -ENOMEM error,  to try again?

    If it is a bug, may the two options below suitable to solve the
    issue?

    Option1:

        use policy_rwlock to protect the "sidtab_shutdown" &
    "sidtab_map" , load_policy is rarely to be called after system
    bringing up,  

    so i think it will not impact much on the performance.

    Option2:

        Use a temp list to store the sid in oldsidtab after it is
    shutdown, and deal with it 

    after cloning the major sids from oldsidtab to newsidtab and getting
    the policy_rwlock.

The current logic is intentional.  One might argue that a different error code might be more suitable in this situation (e.g. EAGAIN or something), but ENOMEM is already a possible error code when loading policy upon transient OOM conditions, so the caller might have to retry regardless.  The policy write lock must only be held for the minimal critical section as otherwise all system activity could be blocked; it is only held when making the new policydb and sidtab active not during their allocation and setup - ideally it would be further reduced to a couple of pointer updates. Is this scenario something you are encountering in actual practice or just a theoretically possible one?