----- Original Message ----- > The purpose of this patch is to work out "bt" command for the diskdump > which NT_PRSTATUS note could not be saved by IPI lost. > I think IPI is possibly lost by panic under the serious crashed condition. > > I noticed that "bt" failed in my ppc environment > when the NT_PRSTATUS notes are lost on some CPUs while IPI delivery. > Then, I made CPU map for prstatus in diskdump more correctable > by checking a validation of crash_notes field. > > I've tested this problem by patching kernel like, kernel/kexec.c > void crash_save_cpu(struct pt_regs *regs, int cpu) > { > + if (current->pid == 0) > + /* this cpu was idle; nothing to capture */ > + return; > > It looks terrible and impractical test case but actually I met this code > in my using distro's kernel. I couldn't reproduce actual IPI lost case, > then fortunately, use this as a example of the causes if IPI could not be > delivered to other CPUs. > > => Taking diskdump by sysrq+c and makedumpfile. > > crash> help -D | grep notes > num_prstatus_notes: 1 > notes_buf: 10ba91a8 > notes[0]: 10ba91a8 > crash> help -k | grep cpus > cpus: 8 > cpus_override: (null) > crash> bt > PID: 1001 TASK: ea62b000 CPU: 2 COMMAND: "bash" > Segmentation fault > > Since seven idle cpus did not save NT_PRSTATUS note, > crash could not handle CPU#2's note where is located as CPU#0's. > > With this patch, crash get to work out with correct CPU map to > prstatus. > > WARNING: catch lost crash_notes at cpu#0 > WARNING: catch lost crash_notes at cpu#1 > WARNING: catch lost crash_notes at cpu#3 > WARNING: catch lost crash_notes at cpu#4 > WARNING: catch lost crash_notes at cpu#5 > WARNING: catch lost crash_notes at cpu#6 > WARNING: catch lost crash_notes at cpu#7 > crash.fix> help -D | grep notes > num_prstatus_notes: 1 > notes_buf: 107a3378 > notes[2]: 107a3378 > crash.fix> help -k | grep cpus > cpus: 8 > cpus_override: (null) > crash.fix> bt > PID: 1001 TASK: ea62b000 CPU: 2 COMMAND: "bash" > > R0: 00000001 R1: eb793e60 R2: ea62b000 R3: 00000063 > R4: 00000000 R5: ffffffff R6: c043ba2c R7: 00000000 > R8: 00008000 R9: 00000000 R10: 00000000 R11: eb793e70 > R12: 28242444 R13: 100b8448 R14: 100b07b8 R15: 100b0894 > R16: 00000000 R17: 00000000 R18: 00000000 R19: 1006d270 > R20: 00000000 R21: 100f0430 R22: 00000000 R23: 00000001 > R24: c08f1ac8 R25: 00029002 R26: c08f1bac R27: c08d0000 > R28: 00000000 R29: c09ada48 R30: 00000063 R31: eb793e60 > NIP: c0423378 MSR: 00021002 OR3: c09ada48 CTR: c0423344 > LR: c0423d8c XER: 00000000 CCR: 28242444 MQ: 00008000 > DAR: 00000000 DSISR: 00800000 Syscall Result: eb793e60 > NIP [00000000c0423378] sysrq_handle_crash > LR [00000000c0423d8c] __handle_sysrq > > #0 [eb793e60] sysrq_handle_crash at c0423378 > : snip > > Thanks, > Toshi Toshi, I don't want to add any new initialization-time code -- especially if it's related to the NT_PRSTATUS notes -- that can abort a crash session unnecessarily. In your new crash_was_lost_crash_note() function, there are these two FAULT_ON_ERROR readmem() calls: readmem(symbol_value("crash_notes"), KVADDR, &crash_notes_ptr, sizeof(ulong), "crash_notes", FAULT_ON_ERROR); and readmem(crash_notes_ptr, KVADDR, buf, SIZE(note_buf), "cpu crash_notes", FAULT_ON_ERROR); Although they are highly unlikely to fail, can you please make both of them RETURN_ON_ERROR, and if the readmem() fails, have it bail out and return FALSE? And then, if necessary, make any adjustments to map_cpus_to_prstatus_kdump_cmprs() to handle that remote possibility. You should be able to test it with your patched kernel. Also, I don't understand the wording of this error message at the end of crash_was_lost_crash_note(): error(WARNING, "catch lost crash_notes at cpu#%d\n", cpu); Can you re-word that? The notes were not "lost", but rather were "not saved" by the crashing system. Lastly, in __diskdump_memory_dump(), you just skip the "lost" notes sections: for (i = 0, j = 0; j < dd->num_prstatus_notes; i++) { if (dd->nt_prstatus_percpu[i] == NULL) continue; fprintf(fp, " notes[%d]: %lx\n", i, (ulong)dd->nt_prstatus_percpu[i]); j++; } Can you make it more obvious, say, by displaying something like: notes[6]: (not saved) Thanks, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility