Re: Fixed strace [ was Re: ls -l is broken ]

"John David Anglin" <dave@xxxxxxxxxxxxxxxxxx> · Sat, 9 May 2009 13:18:03 -0400 (EDT)

> On Wed, May 06, 2009 at 01:39:49PM -0400, John David Anglin wrote:
> > > The tombstone is:
> > > 
> > > do_page_fault() pid=10205 command='strace' type=15 address=0x407d2f18
> > > vm_start = 0x4068d000, vm_end = 0x4068f000
> > 
> > So, the pointer passed to __canonicalize_funcptr_for_compare is outside
> > the vm range.
> > 
> > Maybe "info sharedlib" will show something.  Need to find out why the
> > address of the function descriptor is outside the vm range.
> > 
> 
> 405c0000-405c2000 rwxp 405c0000 00:00 0 
> 
> is the output of /proc/maps there... No idea wtf this is. :/

What are vm_start and vm_end?

Another segv last night.

Core was generated by `rm -f ada/bldtools/nmake_s/sinfo.ads ada/bldtools/nmake_s/nmake.adt ada/bldtool'.
Program terminated with signal 11, Segmentation fault.
[New process 12287]
#0  _dl_relocate_object (scope=0x40000db8, lazy=<value optimized out>, 
    consider_profiling=0) at do-rel.h:119
119	do-rel.h: No such file or directory.
	in do-rel.h

(gdb) p/x $pc
$1 = 0x402759ac

0x40275980 <_dl_relocate_object+728>:	ldw 4(ret0),r17
0x40275984 <_dl_relocate_object+732>:	ldw 4(r9),r21
0x40275988 <_dl_relocate_object+736>:	extrw,u r21,23,24,r22
0x4027598c <_dl_relocate_object+740>:	depw,z r22,27,28,r13
0x40275990 <_dl_relocate_object+744>:	ldw 0(r9),r20
0x40275994 <_dl_relocate_object+748>:	add,l r16,r13,r8
0x40275998 <_dl_relocate_object+752>:	stw r8,8(r3)
0x4027599c <_dl_relocate_object+756>:	add,l r12,r20,r10
0x402759a0 <_dl_relocate_object+760>:	ldb c(r8),ret0
0x402759a4 <_dl_relocate_object+764>:	extrw,u r21,31,8,r6
---Type <return> to continue, or q <return> to quit---
0x402759a8 <_dl_relocate_object+768>:	extrw,u ret0,27,28,ret0
0x402759ac <_dl_relocate_object+772>:	ldh,s r22(r17),r31
0x402759b0 <_dl_relocate_object+776>:	ldw 170(r11),ret1
0x402759b4 <_dl_relocate_object+780>:	cmpib,= 0,ret0,0x40275a64 <_dl_relocate_object+956>
0x402759b8 <_dl_relocate_object+784>:	copy r11,r5
End of assembler dump.
(gdb) p/x $r11
$3 = 0x40000c00
(gdb) p/x $r11 + 0xe4
$4 = 0x40000ce4
(gdb) x/x 0x40000ce4 
0x40000ce4:	0x4071e828
(gdb) x/x 0x4071e828 + 4
0x4071e82c <.LC2+200>:	0xa38cf763

(gdb) p/x $r22
$6 = 0x1

May  8 22:39:41 mx3210 kernel: do_page_fault() pid=12287 command='rm' type=15 address=0xa38cf765
May  8 22:39:41 mx3210 kernel: vm_start = 0x40724000, vm_end = 0x40726000

So, the address for ldh,s matches that in the tombstone.

...
002ae000-005f5000 rwxp 002ae000 00:00 0                                  [heap]
40000000-4000c000 rw-p 40000000 00:00 0 
4000c000-40011000 r-xp 00000000 08:03 640603                             /lib/libthread_db-1.0.so
...
4066d000-40670000 rwxp 0007a000 08:03 641636                             /lib/libm-2.9.so
40670000-4073d000 rw-p 4028d000 00:00 0 
...
40b26000-410fe000 rw-p 40b26000 00:00 0 
c0215000-c022a000 rwxp c0215000 00:00 0                                  [stack]

I don't see anything in /proc/maps that matches the vm range in the tombstone.

Comparing memory for the core dump with a normal start to main:

Core dump
(gdb) x/64x 0x4071e800
0x4071e800 <.LC2+156>:	0x6ffffffc	0x000129a8	0x6ffffffd	0x00000013
0x4071e810 <.LC2+172>:	0x0000001e	0x00000014	0x6ffffffe	0x00012c3c
0x4071e820 <.LC2+188>:	0x6fffffff	0x00000001	0x6ffffff0	0xa38cf763
0x4071e830 <.LC2+204>:	0x6ffffff9	0x00000d5b	0x00000000	0x00000000
0x4071e840 <.LC2+220>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e850 <.LC2+236>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e860 <__libc_multiple_libcs>:	0x00000001	0x00000000	0x00000000	0x00000000
0x4071e870 <__gconv_lock>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e880 <__gconv_lock+16>:	0x00000001	0x00000001	0x00000001	0x00000001
0x4071e890 <__gconv_lock+32>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e8a0 <lock.11041>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e8b0 <lock.11041+16>:	0x00000001	0x00000001	0x00000001	---Type <return> to continue, or q <return> to quit---
0x00000001
0x4071e8c0 <lock.11041+32>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e8d0 <lock>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e8e0 <lock+16>:	0x00000001	0x00000001	0x00000001	0x00000001
0x4071e8f0 <lock+32>:	0x00000000	0x00000000	0x00000000	0x00000000

Normal:
(gdb) x/64x 0x4071e800
0x4071e800 <.LC2+156>:	0x6ffffffc	0x000129a8	0x6ffffffd	0x00000013
0x4071e810 <.LC2+172>:	0x0000001e	0x00000014	0x6ffffffe	0x00012c3c
0x4071e820 <.LC2+188>:	0x6fffffff	0x00000001	0x6ffffff0	0x405ea8cc
0x4071e830 <.LC2+204>:	0x6ffffff9	0x00000d5b	0x00000000	0x00000000
0x4071e840 <.LC2+220>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e850 <.LC2+236>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e860 <__libc_multiple_libcs>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e870 <__gconv_lock>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e880 <__gconv_lock+16>:	0x00000001	0x00000001	0x00000001	0x00000001
0x4071e890 <__gconv_lock+32>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e8a0 <lock.11041>:	0x00000000	0x00000000	0x00000000	0x00000000
0x4071e8b0 <lock.11041+16>:	0x00000001	0x00000001	0x00000001

I sure looks as if memory has been stomped.  Specifically, the word that
caused the segv.  The surrounding values are the same.

Dave
-- 
J. David Anglin                                  dave.anglin@xxxxxxxxxxxxxx
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html