----- Original Message ----- > > Thank you very much for the info, really helpful and very much > apprecaited. I have a few follow on questions: > > 1. When the page fault occurs, is some of the registers (which might > contain parameters passed to the offending function) trampled on? If > yes, is there a document or would you happen to know what registers > (in the worst case) are written to. > > The reason I ask is in my dump below the register - RDI (used to pass > the first param to ahaahh() ) should be zero (to have caused the > page fault), but it is not. RDI was originally passed into ahaann() as an argument, and the evidence shows that it had a value of NULL. However, it was subsequently needed as an argument register for the call to ahahtl() at ahaann+37. So before being reused, RDX was copied/saved in RBX at ahaann+22. And then RDX was overwritten/reused at ahaann+28: > > From register dump after panic: > RBX: 0000000000000000 RDI: ffff88035daef4e0 (I expect this to be zero > per the dis-assembly code). > > Reverse dis-assembly from RP when panic occurred: > crash> dis -r ffffffffa06ce48f > 0xffffffffa06ce460 <ahaann>: push %rbp > 0xffffffffa06ce461 <ahaann+1>: mov %rsp,%rbp > 0xffffffffa06ce464 <ahaann+4>: push %r12 > 0xffffffffa06ce466 <ahaann+6>: push %rbx > 0xffffffffa06ce467 <ahaann+7>: nopl 0x0(%rax,%rax,1) > 0xffffffffa06ce46c <ahaann+12>: mov $0xffffffffa092c548,%rdx > 0xffffffffa06ce473 <ahaann+19>: movzwl %si,%ecx > 0xffffffffa06ce476 <ahaann+22>: mov %rdi,%rbx <========== > 0xffffffffa06ce479 <ahaann+25>: mov %esi,%r12d > 0xffffffffa06ce47c <ahaann+28>: mov $0xffffffffa092e5f0,%rdi > 0xffffffffa06ce483 <ahaann+35>: xor %esi,%esi > 0xffffffffa06ce485 <ahaann+37>: callq 0xffffffffa06cd860 <ahahtl> > 0xffffffffa06ce48a <ahaann+42>: test %rax,%rax > 0xffffffffa06ce48d <ahaann+45>: jne 0xffffffffa06ce500 <ahaann+160> > 0xffffffffa06ce48f <ahaann+47>: mov (%rbx),%rdi <========== And so in your case, the page fault was caused by the NULL pointer in RBX, which was originally passed into the function in RDI. > 2. Does Linux (specifically crash) treat access to invalid address or > NULL ptr dereference the same way, as in calling them both page > fault? (In one of my past work places, the crash dump was explicit > is stating when a NULL ptr dereference occurred, and I am wondering > now if that was due to a customization in crash). The crash utility doesn't have anything to do with it -- it simply trying to resurrect what happened by what it sees left on the stack. The kernel will transition to page_fault() on either a NULL pointer or an invalid address (although sometimes an invalid address will generate a general protection fault exception if certain bits are set in the bad address). If you do a "log" command, you will see a string that precedes the final blurb containing the register dump and backtrace that will also confirm what kind of exception occurred. Your's probably says: BUG: unable to handle kernel NULL pointer dereference at (null) which gets generated here in the kernel's show_fault_oops() function: printk(KERN_ALERT "BUG: unable to handle kernel "); if (address < PAGE_SIZE) printk(KERN_CONT "NULL pointer dereference"); else printk(KERN_CONT "paging request"); printk(KERN_CONT " at %p\n", (void *) address); printk(KERN_ALERT "IP:"); printk_address(regs->ip, 1); > > > 3. Expanding on the meaning of the address in [] at the beginning of each line of the bt > > > [addr0] function0 at addr2 > [addr1] function1 at addr2 > > addr1 - 8 : starting address of the stack frame from function1 upto > the addr0. I can use this info to peek into the values of function > local variables pushed onto the stack (specifically the function's > stack frame). Exactly -- you can use "bt -f" or "bt -F" to do just that, where -f just dumps the raw stack frame data, whereas -F also translates the stack contents into known variable names/offsets, or into the slab cache that it came from if either case is applicable. For example: crash> bt ... #12 [ffff880037cb9ef0] vfs_write at ffffffff81172718 #13 [ffff880037cb9f30] sys_write at ffffffff81173151 ... crash> bt -f ... #12 [ffff880037cb9ef0] vfs_write at ffffffff81172718 ffff880037cb9ef8: ffff880037cb9f78 ffffffff810d1b62 ffff880037cb9f08: ffff880078056260 ffff8800781248c0 ffff880037cb9f18: 00007f9b6f177000 0000000000000002 ffff880037cb9f28: ffff880037cb9f78 ffffffff81173151 #13 [ffff880037cb9f30] sys_write at ffffffff81173151 ... crash> bt -F ... #12 [ffff880037cb9ef0] vfs_write at ffffffff81172718 ffff880037cb9ef8: ffff880037cb9f78 audit_syscall_entry+626 ffff880037cb9f08: [size-1024] [filp] ffff880037cb9f18: 00007f9b6f177000 0000000000000002 ffff880037cb9f28: ffff880037cb9f78 sys_write+81 #13 [ffff880037cb9f30] sys_write at ffffffff81173151 ... Often times the [slab-cache] or symbol+offset references can help pinpoint a local variable. Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility