core dump analysis, was Re: stack smashing detected

Finn Thain <fthain@xxxxxxxxxxxxxx> · Tue, 28 Mar 2023 14:37:38 +1100 (AEDT)

On Sat, 18 Feb 2023, I wrote:

On Fri, 17 Feb 2023, Stan Johnson wrote:

That's not to say a SIGABRT is ignored, it just doesn't kill PID 1.

I doubt that /sbin/init is generating the "stack smashing detected" 
error but you may need to modify it to find out. If you can't figure out 
which userland binary is involved, you'll have to focus on your custom 
kernel binary, just as I proposed in my message dated 8 Feb 2023.

Using the core dump I generated on my Mac LC III, and using a workaround 
for the gdb regression, I was able to get the backtrace below.

root@(none):/root# gdb
GNU gdb (Debian 13.1-2) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
...
(gdb) set osabi GNU/Linux
(gdb) exec /bin/dash
(gdb) core /root/core.0
warning: core file may not match specified executable file.
[New LWP 366]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1".
Core was generated by `/bin/sh /etc/init.d/mountkernfs.sh reload'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=3222954656, signo=6, no_tid=0)
    at pthread_kill.c:44
44      pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=3222954656, signo=6, no_tid=0)
    at pthread_kill.c:44
#1  0xc00a7080 in __pthread_kill_internal (signo=6, threadid=3222954656)
    at pthread_kill.c:78
#2  __GI___pthread_kill (threadid=3222954656, signo=6) at pthread_kill.c:89
#3  0xc0064c22 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
#4  0xc0052faa in __GI_abort () at abort.c:79
#5  0xc009b328 in __libc_message (action=<optimized out>, fmt=<optimized out>)
    at ../sysdeps/posix/libc_fatal.c:155
#6  0xc012a3c2 in __GI___fortify_fail (
    msg=0xc0182c5e "stack smashing detected") at fortify_fail.c:26
#7  0xc012a3a0 in __stack_chk_fail () at stack_chk_fail.c:24
#8  0xc00e0172 in __wait3 (stat_loc=<optimized out>, options=<optimized out>, 
    usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait3.c:41
#9  0xd000c38e in ?? ()
#10 0xefee111e in ?? ()
#11 0x00000000 in ?? ()
(gdb) 

It appears that the failure was in glibc (though I guess the root cause 
may lie elsewhere). I have two more core files generated by dash (actually 
by `/bin/sh /etc/rcS.d/S08mountall.sh start') that give the same 
backtrace. So even though the failure is intermittent, the site of the 
buffer overrun seems to be consistent.

Looking at sysdeps/unix/sysv/linux/wait3.c, I guess the only possible 
place for a buffer overrun would be struct __rusage64 usage64.
https://sources.debian.org/src/glibc/2.36-8/sysdeps/unix/sysv/linux/wait3.c/?hl=41#L41

(gdb) select-frame 8
(gdb) print usage64
$3 = {ru_utime = {tv_sec = 6481621047248640, tv_usec = 91671782025504}, 
  ru_stime = {tv_sec = 25769811968, tv_usec = 8591449888}, {
    ru_maxrss = 1515296, __ru_maxrss_word = 1515296}, {ru_ixrss = 1515296, 
    __ru_ixrss_word = 1515296}, {ru_idrss = 224, __ru_idrss_word = 224}, {
    ru_isrss = 224, __ru_isrss_word = 224}, {ru_minflt = 6, 
    __ru_minflt_word = 6}, {ru_majflt = 4, __ru_majflt_word = 4}, {
    ru_nswap = 4, __ru_nswap_word = 4}, {ru_inblock = 372, 
    __ru_inblock_word = 372}, {ru_oublock = 0, __ru_oublock_word = 0}, {
    ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 8, 
    __ru_msgrcv_word = 8}, {ru_nsignals = 367, __ru_nsignals_word = 367}, {
    ru_nvcsw = 10, __ru_nvcsw_word = 10}, {ru_nivcsw = 0, 
    __ru_nivcsw_word = 0}}
(gdb)

Of course, at this point the damage has already been done and the culprit 
has gone. I guess there was a buffer overrun during the call to 
__wait4_time64(). 
https://sources.debian.org/src/glibc/2.36-8/sysdeps/unix/sysv/linux/wait4.c/?hl=26#L26

It's hard to read glibc source code without knowing what all the macros 
were set to (such as __KERNEL_OLD_TIMEVAL_MATCHES_TIMEVAL64 and 
__TIMESIZE).

It would be disappointing if rusage64_to_rusage() in __wait3() was being 
applied to the result of rusage32_to_rusage64() from __wait4_time64(). 
Perhaps the ifdefs are arranged in such a way that it doesn't happen...

Anyway, does anyone know how to get a hex dump of the whole stack frame 
including the canary, in case there is something to be learned from that?