Re: stack smashing detected

Michael Schmitz <schmitzmic@xxxxxxxxx> · Tue, 31 Jan 2023 16:05:15 +1300

Hi Stan,

Am 30.01.2023 um 17:00 schrieb Stan Johnson:
Hello,

I am seeing anywhere from zero to four of the following errors while
booting Linux on 68030 systems and using sysvinit startup scripts:

*** stack smashing detected ***: terminated
Aborted

I usually (but not always) see three of the errors while init is running
the rcS.d scripts, and one while running the rc2.d scripts. The stack
smashing messages appear only on the system console (nothing is logged
in an error log or dmesg). Despite the errors, the system continues
booting to multiuser mode without any obvious additional problems. I
haven't tested systemd, which is too slow to be useful on my m68k
systems (though I have a Debian SID with systemd that I can restore for
testing if necessary).

I'm using the current Debian SID and Debian kernel, and I've confirmed
the errors on a Mac IIci and SE/30. I haven't seen the errors on any
68040 system (I only tested on a Centris 650 and PowerBook 550c). I also
notice the errors on 68030 systems using custom kernels that I have
cross-compiled using GCC 12 or GCC 10 on a x86_64 system running Debian
SID; however, I do not see the errors as often if I cross-compile using
GCC 8.3.0 on a 686 system (running Debian 10.7 Buster) -- I saw the
errors a few weeks ago with an earlier kernel, but none today using
Linux 6.1.8 cross-compiled with GCC 8.3.0.

I'll be happy to help debug or troubleshoot, though at this point, since
the "stack smashing detected" errors aren't reporting which processes
are being terminated/aborted, I'm not sure where to start.

The man page of init states that init logs process and reason for 
termination in /var/run/utmp and /var/log/wtmp each time a child process 
terminates. You're looking for processed terminated by SIGABRT as far as 
I can see.

There does not appear to be any tool to extract that information from 
utmp/wtmp files though - utmpdump only shows login process information 
for me, nothing on init processes.

Another way may be logging the start of each of the rcS.d or rc2.d 
scripts until you know what scripts to look at in more detail, then 
adding 'set -v' at the start of those to log every command in the 
offending script.

Once the offending binary is known (and the crash can be reproduced 
after system boot), gdb can be used to find the function that overwrote 
its local stack guard.

That's a lot of work on a 030 Mac - have you tried to reproduce this on 
any kind of emulator?

I suppose one difference between your 030 and 040 Macs might be the 
amount of RAM available. I wonder if this bug results from a combination 
of 030 MMU and memory pressure, or 030 MMU only.

Cheers,

	Michael

thanks for any suggestions

-Stan Johnson   userm57@xxxxxxxxx