Re: semodule -i and load_policy coredumps on version 3.0 - not latest GIT

Russell Coker <russell@xxxxxxxxxxxx> · Tue, 21 Apr 2020 14:01:45 +1000

On Wednesday, 15 April 2020 3:27:38 AM AEST Nicolas Iooss wrote:
> This looks a pretty difficult issue. The facts that it is not easily
> reproducible and that the stack trace changes even though the 2
> modules you are testing do not are interesting. They imply that there

I have done more further testing.

I could not reproduce it on another VM on the same hardware.

I could not reproduce it on the same VM after a reboot of the physical 
hardware (running Debian/Unstable with KVM).

After the reboot I could not reproduce it on saved snapshots of the VM in 
question dating back to when I had previously had problems.  I conclude that 
rebooting the hardware solved the problem.

The problem was either an issue of failing hardware (I am running memtest86+ 
right now) or hostile action.  When testing for issues with libsepol I got a 
couple of coredumps from valgrind, that isn't necessarily an indication of 
anything (valgrind is complex software and it provides information on how to 
report bugs when it crashes so crashes of valgrind aren't unexpected).  I also 
got one coredump from sshd which is very unexpected, sshd is known to be high 
quality software that is well written and well audited.  This makes me wonder 
whether there is some commonality between sshd and semodule that causes both 
of them to have had problems on the system in question.  For background the 
sshd coredump info is below.

# coredumpctl info /usr/sbin/sshd
           PID: 42696 (sshd)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 11 (SEGV)
     Timestamp: Tue 2020-04-14 19:48:42 UTC (6 days ago)
  Command Line: sshd: [accepted]
    Executable: /usr/sbin/sshd
       Boot ID: eec56f683e7b4aeb90a89845bd7920f8
    Machine ID: 384a085cdf4a437cae153168e34245f4
      Hostname: play
       Storage: /var/lib/systemd/coredump/core.sshd.
0.eec56f683e7b4aeb90a89845bd7920f8.42696.1586893722000000000000.lz4
       Message: Process 42696 (sshd) of user 0 dumped core.

                Stack trace of thread 42696:
                #0  0x00007f2dfe8da2e7 dl_new_hash (ld-linux-x86-64.so.2 + 
0xa2e7)
                #1  0x00007f2dfe8deaf3 _dl_fixup (ld-linux-x86-64.so.2 + 
0xeaf3)
                #2  0x00007f2dfe8e5383 _dl_runtime_resolve_fxsave (ld-linux-
x86-64.so.2 + 0x15383)
                #3  0x00007f2dfe1453e0 n/a (libcap-ng.so.0 + 0x23e0)
                #4  0x00007f2dfe1e9c78 __run_fork_handlers (libc.so.6 + 
0x84c78)
                #5  0x00007f2dfe22ffb8 __libc_fork (libc.so.6 + 0xcafb8)
                #6  0x000055ab9ae2bac9 n/a (sshd + 0xfac9)
                #7  0x00007f2dfe18be0b __libc_start_main (libc.so.6 + 0x26e0b)
                #8  0x000055ab9ae2bf7a n/a (sshd + 0xff7a)

I ran the spectre-meltdown-checker script, it says that the physical hardware 
in question is vulnerable to the following (there doesn't seem to be microcode 
updates for the Q9505 CPU to fix all the issues):

CVE-2018-3640 aka 'Variant 3a, rogue system register read'
* CPU microcode mitigates the vulnerability:  NO 
> STATUS:  VULNERABLE  (an up-to-date CPU microcode is needed to mitigate this 
vulnerability)

CVE-2018-3639 aka 'Variant 4, speculative store bypass'
* Mitigated according to the /sys interface:  NO  (Vulnerable)
* Kernel supports disabling speculative store bypass (SSB):  YES  (found in /
proc/self/status)
* SSB mitigation is enabled and active:  NO 
> STATUS:  VULNERABLE  (Your CPU doesn't support SSBD)

CVE-2018-12126 aka 'Fallout, microarchitectural store buffer data sampling 
(MSBDS)'
* Mitigated according to the /sys interface:  NO  (Vulnerable: Clear CPU 
buffers attempted, no microcode; SMT disabled)
* Kernel supports using MD_CLEAR mitigation:  YES  (found md_clear 
implementation evidence in kernel image)
* Kernel mitigation is enabled and active:  NO 
* SMT is either mitigated or disabled:  YES 
> STATUS:  VULNERABLE  (Your kernel supports mitigation, but your CPU 
microcode also needs to be updated to mitigate the vulnerability)

CVE-2018-12130 aka 'ZombieLoad, microarchitectural fill buffer data sampling 
(MFBDS)'
* Mitigated according to the /sys interface:  NO  (Vulnerable: Clear CPU 
buffers attempted, no microcode; SMT disabled)
* Kernel supports using MD_CLEAR mitigation:  YES  (found md_clear 
implementation evidence in kernel image)
* Kernel mitigation is enabled and active:  NO 
* SMT is either mitigated or disabled:  YES 
> STATUS:  VULNERABLE  (Your kernel supports mitigation, but your CPU 
microcode also needs to be updated to mitigate the vulnerability)

CVE-2018-12127 aka 'RIDL, microarchitectural load port data sampling (MLPDS)'
* Mitigated according to the /sys interface:  NO  (Vulnerable: Clear CPU 
buffers attempted, no microcode; SMT disabled)
* Kernel supports using MD_CLEAR mitigation:  YES  (found md_clear 
implementation evidence in kernel image)
* Kernel mitigation is enabled and active:  NO 
* SMT is either mitigated or disabled:  YES 
> STATUS:  VULNERABLE  (Your kernel supports mitigation, but your CPU 
microcode also needs to be updated to mitigate the vulnerability)

CVE-2019-11091 aka 'RIDL, microarchitectural data sampling uncacheable memory 
(MDSUM)'
* Mitigated according to the /sys interface:  NO  (Vulnerable: Clear CPU 
buffers attempted, no microcode; SMT disabled)
* Kernel supports using MD_CLEAR mitigation:  YES  (found md_clear 
implementation evidence in kernel image)
* Kernel mitigation is enabled and active:  NO 
* SMT is either mitigated or disabled:  YES 
> STATUS:  VULNERABLE  (Your kernel supports mitigation, but your CPU 
microcode also needs to be updated to mitigate the vulnerability)

Given that the system is vulnerable to certain known attacks and that sshd is 
a prime target for any such attack I believe that the sshd SEGV is an 
indication that the root cause might have been hostile action.  I don't expect 
to ever have proof of what was the cause (unless memtest86+ flags an error).  
When hostile activity goes away on reboot then something memory resident is 
likely in which case there's probably no record on disk.

I am convinced beyond all reasonable doubt that the SEGVs and valgrind 
warnings I saw from semodule were not evidence of a bug in libsepol.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/