Re: core dump analysis, was Re: stack smashing detected

Finn Thain <fthain@xxxxxxxxxxxxxx> · Sun, 9 Apr 2023 14:02:14 +1000 (AEST)

On Tue, 4 Apr 2023, I wrote:

The actual corruption might offer a clue here. I believe the saved %a3 
was clobbered with the value 0xefee1068 which seems to be a pointer into 
some stack frame that would have come into existence shortly after 
__GI___wait4_time64 was called.

Wrong... it is a pointer to the location below the __wait3 stack frame.

(gdb) info frame
Stack level 8, frame at 0xefee10e0:
 pc = 0xc00e0172 in __wait3 (../sysdeps/unix/sysv/linux/wait3.c:41); 
    saved pc = 0xd000c38e
 called by frame at 0xefee11dc, caller of frame at 0xefee106c
 source language c.
 Arglist at 0xefee10d8, args: stat_loc=<optimized out>, 
    options=<optimized out>, usage=<optimized out>
 Locals at 0xefee10d8, Previous frame's sp is 0xefee10e0
 Saved registers:
  a2 at 0xefee106c, a3 at 0xefee1070, a5 at 0xefee1074, fp at 0xefee10d8,
  pc at 0xefee10dc

That shows %a2 was saved at 0xefee106c, and the address of interest is the 
stack location immediately below that. But it has no particular 
significance: it holds a NULL pointer when the struct __rusage64 *usage 
argument to __wait4_time64() gets pushed there:

   0xc00e8152 <__wait3+226>:   clrl %sp@-
   0xc00e8154 <__wait3+228>:   movel %fp@(12),%sp@-
   0xc00e8158 <__wait3+232>:   movel %d0,%sp@-
   0xc00e815a <__wait3+234>:   pea 0xffffffff
   0xc00e815e <__wait3+238>:   bsrl 0xc00e8174 <__GI___wait4_time64>

But it's no longer a NULL pointer at the time of the crash, though it 
should be, since that stack frame is still active.

(gdb) x/16z 0xefee1068
0xefee1068:     0xc00e0172      0xd001e718      0xd001e498      0xd001b874
0xefee1078:     0x00170700      0x00170700      0x00170700      0x00005360
0xefee1088:     0x0000e920      0x00000006      0x00002000      0x00000002
0xefee1098:     0x00171f20      0x00171f20      0x00171f20      0x000000e0

Beats me.

At the time of the crash, the corrupted %a3 was a pointer to location in 
__wait3's stack. That location was a NULL pointer (the *usage parameter) 
when __GI___wait4_time64 was called but now points to 0xc00e0172, which is 
just after the __wait3 text and just before __GI___wait4_time64 text.

(gdb) disass __wait3
Dump of assembler code for function __wait3:
...
   0xc00e015e <+238>:   bsrl 0xc00e0174 <__GI___wait4_time64>
   0xc00e0164 <+244>:   lea %sp@(16),%sp
   0xc00e0168 <+248>:   braw 0xc00e00b2 <__wait3+66>
   0xc00e016c <+252>:   bsrl 0xc012a38c <__stack_chk_fail>
End of assembler dump.
(gdb) disass __GI___wait4_time64
Dump of assembler code for function __GI___wait4_time64:
   0xc00e0174 <+0>:     lea %sp@(-80),%sp
   0xc00e0178 <+4>:     moveml %d2-%d5/%a2-%a3/%a5,%sp@-
   0xc00e017c <+8>:     lea %pc@(0xc0198000),%a5
   0xc00e0184 <+16>:    movel %sp@(116),%d2
...

But I realize now that this stack location gets overwritten with the 
return address for bsrl __stack_chk_fail, so there's nothing wrong there.

Perhaps its just a coincidence that the saved %a3, once corrupted, ended 
up pointing to the *usage parameter... I don't know what to make of that.