Re: semodule -i and load_policy coredumps on version 3.0 - not latest GIT

Nicolas Iooss <nicolas.iooss@xxxxxxx> · Tue, 14 Apr 2020 19:27:38 +0200

On Tue, Apr 14, 2020 at 2:29 AM Russell Coker <russell@xxxxxxxxxxxx> wrote:
>
> I'm getting core dumps from inserting modules, I can repeatedly run semodule
> with the same module and have it crash some times and not others.  But it
> crashes more often if I have 2 slightly different modules of the same name and
> alternate between inserting them.
>
> while semodule -i pol/toadd.pp && sleep 8 && semodule -i pol2/toadd.pp &&
> sleep 8 ; do date ; done
>
> The above shell command is pretty good at causing SEGVs.
>
> This happens regularly with libsepol version 3.0 (which is in Debian/
> Unstable), so far I have not reproduced it with the latest git version of
> libsepol.  While I'm not certain the bug is fixed in the latest git version, I
> think it's very likely to be fixed (I'll have to run tests for another couple
> of days to be convinced).  Have libsepol developers knowingly fixed such a bug?
>
> Here's coredumpctl output from semodule (at the time libsepol wasn't compiled
> with debugging symbols):
>
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/usr/sbin/semodule -i toadd.pp'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
> 120     ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory.
> (gdb) bt
> #0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
> #1  0x00007ff2128cf756 in __vfprintf_internal (s=s@entry=0x7ffecc31daa0,
>     format=format@entry=0x7ff212af88f9 "Error: Unknown keyword %s\n",
>     ap=ap@entry=0x7ffecc31de40, mode_flags=mode_flags@entry=2)
>     at vfprintf-internal.c:1688
> #2  0x00007ff2128e11f6 in __vsnprintf_internal (
>     string=0x7ffecc31dc20 "Error: Unknown keyword ", maxlen=<optimized out>,
>     format=0x7ff212af88f9 "Error: Unknown keyword %s\n", args=0x7ffecc31de40,
>     mode_flags=2) at vsnprintf.c:114
>
> Here's one from load_policy which I believe is related.  Running semodule -i
> repeatedly on the same file doesn't seem to cause a problem (I've had a loop of
> that run for hours without a SEGV) but it happened quickly when alternately
> loading 2 slightly different files.
>
>   Command Line: /sbin/load_policy
>     Executable: /usr/sbin/load_policy
>        Boot ID: 8727799a8e0b44f1885f1b4c681efea9
>     Machine ID: 384a085cdf4a437cae153168e34245f4
>       Hostname: play
>        Storage: /var/lib/systemd/coredump/core.load_policy.
> 0.8727799a8e0b44f188>
>        Message: Process 70655 (load_policy) of user 0 dumped core.
>
>                 Stack trace of thread 70655:
>                 #0  0x00007f0716a6685d ebitmap_destroy (libsepol.so.1 +
> 0x1185d)
>                 #1  0x00007f0716a635eb constraint_expr_destroy (libsepol.so.1
> +>
>                 #2  0x00007f0716aa7d71 class_destroy (libsepol.so.1 + 0x52d71)
>                 #3  0x00007f0716a73893 hashtab_map (libsepol.so.1 + 0x1e893)
>                 #4  0x00007f0716aa86b6 symtabs_destroy (libsepol.so.1 +
> 0x536b6)
>                 #5  0x00007f0716aa822b policydb_destroy (libsepol.so.1 +
> 0x5322>
>                 #6  0x00007f0716ab091a policydb_to_image (libsepol.so.1 +
> 0x5b9>
>                 #7  0x00007f0716ab0e08 sepol_policydb_to_image (libsepol.so.1
> +>
>                 #8  0x00007f0716a3eadc selinux_mkload_policy (libselinux.so.1
> +>
>                 #9  0x00005560e76d12bf n/a (load_policy + 0x12bf)
>                 #10 0x00007f071688de0b __libc_start_main (libc.so.6 + 0x26e0b)
>                 #11 0x00005560e76d134a n/a (load_policy + 0x134a)
[...]

Hello,
This looks a pretty difficult issue. The facts that it is not easily
reproducible and that the stack trace changes even though the 2
modules you are testing do not are interesting. They imply that there
is some randomness involved. As far as I remember the code I've read
so far, SELinux's userspace utilities written in C do not use random
numbers. So this non-reproducibility could be caused by something
else, like the order in which files are listed in directories in your
filesystem (for example in /var/lib/selinux...) or the ASLR (Address
Space Layout Randomization).

The first trace seems to hint a buffer overflow. A failure in
ebitmap_destroy() when destructing a policydb object (with
policydb_destroy()) is likely to mean that the object was corrupted in
some way. This makes the hypothesis "you don't have reproducibility
because of ASLR" likely, if for example pointers get used and the
execution path changes depending on their raw values.
In order to test this hypothesis, could you run the while loop with
ASLR disabled ? For example with "setarch $(uname -m) -R semodule -i
pol/toadd.pp"? Does it continue to fail randomly?

In order to test whether this bug is a buffer overflow, another thing
you could do would be to recompile semodule, libsepol and libsemanage
with the Address Sanitizer (for example by cloning the git repository
at the 3.0 tag, running "make DESTDIR=$HOME/selinux-asan CC='gcc
-no-pie -fsanitize=address' install" and configuring your
LD_LIBRARY_PATH and PATH to the newly built files). This might show
where a buffer overflow occurs.

For "Have libsepol developers knowingly fixed such a bug?", recent
commits changed a few things in libsepol's internals and I do not know
of commits that would specifically fix the bug you have.

Best,
Nicolas