On Thu, Oct 16, 2008 at 3:54 PM, Dave Anderson <anderson@xxxxxxxxxx> wrote: > > ----- "Mike Snitzer" <snitzer@xxxxxxxxx> wrote: > >> Frame 0 of crash's core shows: >> (gdb) bt >> #0 0x0000003b708773e0 in memset () from /lib64/libc.so.6 >> >> I'm not sure how to get the faulting address though? Is it just >> 0x0000003b708773e0? > > No, that's the text address in memset(). If you "disass memset", > I believe that you'll see that the address above is dereferencing > the rcx register/pointer. So then, if you enter "info registers", > you'll get a register dump, and rcx would be the failing address. OK. 0x0000003b708773e0 <memset+192>: movnti %r8,(%rcx) (gdb) info registers ... rcx 0xa7b000 10989568 (gdb) x/x 0xa7b000 0xa7b000: Cannot access memory at address 0xa7b000 >> I've not rebooted the system at all either... now when I run 'kmem >> -s' >> in live crash I see: >> >> CACHE NAME OBJSIZE ALLOCATED TOTAL >> SLABS SSIZE >> ... >> kmem: nfs_direct_cache: full list: slab: ffff810073503000 bad inuse >> counter: 5 >> kmem: nfs_direct_cache: full list: slab: ffff810073503000 bad inuse >> counter: 5 >> kmem: nfs_direct_cache: partial list: bad slab pointer: 88 >> kmem: nfs_direct_cache: full list: bad slab pointer: 98 >> kmem: nfs_direct_cache: free list: bad slab pointer: a8 >> kmem: nfs_direct_cache: partial list: bad slab pointer: >> 9f911029d74e35b >> kmem: nfs_direct_cache: full list: bad slab pointer: 6b6b6b6b6b6b6b6b >> kmem: nfs_direct_cache: free list: bad slab pointer: 6b6b6b6b6b6b6b6b >> kmem: nfs_direct_cache: partial list: bad slab pointer: 100000001 >> kmem: nfs_direct_cache: full list: bad slab pointer: 100000011 >> kmem: nfs_direct_cache: free list: bad slab pointer: 100000021 >> ffff810073501600 nfs_direct_cache 192 2 40 >> 2 4k >> ... > Are those warnings happening on *every* slab type? When you run on a > live system, the "shifting sands" of the kernel underneath the crash > utility can cause errors like the above. But at least some/most of > the other slabs' infrastructure should remain stable while the command > runs. Ah makes sense, yes many of them do remain stable: kmem: request_sock_TCPv6: full list: bad slab pointer: 79730070756b6f7f kmem: request_sock_TCPv6: free list: bad slab pointer: 79730070756b6f8f ffff810079199240 request_sock_TCPv6 160 0 0 0 4k ffff81007919a200 TCPv6 1896 3 4 2 4k ffff81007dcb41c0 dm_mpath_io 64 0 0 0 4k ... ffff81007d9ce580 sgpool-8 280 2 42 3 4k ffff81007d9cf540 scsi_bidi_sdb 48 0 0 0 4k ffff81007d98b500 scsi_io_context 136 0 0 0 4k ffff81007d95e4c0 ext3_inode_cache 992 38553 38712 9678 4k ffff81007d960480 ext3_xattr 112 68 102 3 4k etc >> But if I run crash against the vmcore I do get the segfault... >> > > When you run it on the vmcore, do you get the segfault immediately? > Or do some slabs display their stats OK, but then when it deals with > one particular slab it generates the segfault? > > I mean that it's possible that the target slab was in transition > at the time of the crash, in which case you might see some error > messages like you see on the live system. But it is difficult to > explain why it's dying specifically where it is, even if the slab > was in transition. > > That all being said, even if the slab was in transition, obviously > the crash utility should be able to handle it more gracefully... None of the slabs display their stats OK, crash segfaults immediately. >> > BTW, if need be, would you be able to make the vmlinux/vmcore pair >> > available for download somewhere? (You can contact me off-list >> with >> > the particulars...) >> >> I can work to make that happen if needed... > > FYI, I did try our RHEL5 "debug" kernel (2.6.18 + hellofalotofpatches), > which has both CONFIG_DEBUG_SLAB and CONFIG_DEBUG_SLAB_LEAK turned on, > but I don't see the problem. So unless something obvious can be > determined, that may be the only way I can help. Interesting. OK, I'll work to upload them somewhere and I'll send you a pointer off-list. Thanks! Mike -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility