----- Original Message ----- > Hi Dave, > > Thanks for your comments. > > (2012/06/19 1:01), Dave Anderson wrote: > > > I don't want to add any new initialization-time code -- especially if > > it's related to the NT_PRSTATUS notes -- that can abort a crash session > > unnecessarily. In your new crash_was_lost_crash_note() function, there > > are these two FAULT_ON_ERROR readmem() calls: > > > > readmem(symbol_value("crash_notes"), KVADDR,&crash_notes_ptr, > > sizeof(ulong), "crash_notes", FAULT_ON_ERROR); > > and > > > > readmem(crash_notes_ptr, KVADDR, buf, SIZE(note_buf), > > "cpu crash_notes", FAULT_ON_ERROR); > > > > Although they are highly unlikely to fail, can you please make > > both of them RETURN_ON_ERROR, and if the readmem() fails, have > > it bail out and return FALSE? > > I see, I'll make consideration about initialization-time rule from now, > to keep crash session, should use readmem() with QUIET|RETURN_ON_ERROR. > > > And then, if necessary, make any adjustments to map_cpus_to_prstatus_kdump_cmprs() > > to handle that remote possibility. You should be able to test it with your > > patched kernel. > > I'm not able to test for unexpected, deliberate readmem() failure > because my core file contains "crash_notes" memory field... > > Although I've probably misunderstood your advice, > I had better add more debugging messages into > map_cpus_to_prstatus_kdump_cmprs() to handle conditions > where imply "not saved NT_PRSTATUS" or "nt_prstatus_percpu[] is > remapped". > > > Also, I don't understand the wording of this error message > > at the end of crash_was_lost_crash_note(): > > > > error(WARNING, "catch lost crash_notes at cpu#%d\n", cpu); > > > > Can you re-word that? The notes were not "lost", but rather were > > "not saved" by the crashing system. > > I re-word all "lost" keywords, so too function name with "saved". > And also make this warning valid only when CRASHDEBUG() is enable > because we can check it later by using "help -D". > > > Lastly, in __diskdump_memory_dump(), you just skip the "lost" > > notes sections: > > > > for (i = 0, j = 0; j< dd->num_prstatus_notes; i++) { > > if (dd->nt_prstatus_percpu[i] == NULL) > > continue; > > fprintf(fp, " notes[%d]: %lx\n", > > i, (ulong)dd->nt_prstatus_percpu[i]); > > j++; > > } > > > > Can you make it more obvious, say, by displaying something like: > > > > notes[6]: (not saved) > > Looks better, thanks for giving good hint. > > [updates by attached patch] > > - display messages only if CRASHDEBUG() >= 1 > crash -d 1 > => display about NT_PRSTATUS mapping messages as below. > ---------------- > WARNING: cpu#0: not saved crash_notes > WARNING: cpu#1: not saved crash_notes > crash: NT_PRSTATUS: cpu#2 = 107a34a8 (remapped from cpu#0) > WARNING: cpu#3: not saved crash_notes > WARNING: cpu#4: not saved crash_notes > WARNING: cpu#5: not saved crash_notes > WARNING: cpu#6: not saved crash_notes > WARNING: cpu#7: not saved crash_notes > ---------------- > > - help message is enhanced > crash> help -D | grep notes > num_prstatus_notes: 1 > notes_buf: 107a34a8 > notes[0]: (not saved) > notes[1]: (not saved) > notes[2]: 107a34a8 > notes[3]: (not saved) > notes[4]: (not saved) > notes[5]: (not saved) > notes[6]: (not saved) > notes[7]: (not saved) > > Thanks, > Toshi OK, now I'm getting confused... I note that currently __diskdump_memory_dump() does this: for (i = 0; i < dd->num_prstatus_notes; i++) { fprintf(fp, " notes[%d]: %lx\n", i, (ulong)dd->nt_prstatus_percpu[i]); } But your patch does this: nrcpus = kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS; for (i = 0; i < nrcpus; i++) { if (dd->nt_prstatus_percpu[i] == NULL) { fprintf(fp, " notes[%d]: (not saved)\n", i); continue; } fprintf(fp, " notes[%d]: %lx\n", i, (ulong)dd->nt_prstatus_percpu[i]); } Now surely we don't want to dump "NR_CPUS" notes pointers, unless there actually are that many cpus in the system. For example, in RHEL6 sets CONFIG_NR_CPUS to 4096 for x86_64, but rarely do we see a dumpfile with that many cpus. So I would prefer that the for loop continue to be bounded by "dd->num_prstatus_notes". However, given that process_elf32_notes() and process_elf64_notes() do a cursory test of each note by checking for the NT_PRSTATUS type, I'm presuming that in your dumpfile, dd->num_prstatus_notes must be equal to 1, correct? void process_elf64_notes(void *note_buf, unsigned long size_note) { Elf64_Nhdr *nt; size_t index, len = 0; int num = 0; for (index = 0; index < size_note; index += len) { nt = note_buf + index; if(nt->n_type == NT_PRSTATUS) { dd->nt_prstatus_percpu[num] = nt; num++; } len = sizeof(Elf64_Nhdr); len = roundup(len + nt->n_namesz, 4); len = roundup(len + nt->n_descsz, 4); } if (num > 0) { pc->flags2 |= ELF_NOTES; dd->num_prstatus_notes = num; } return; } But then map_cpus_to_prstatus_kdump_cmprs() remaps the notes without modifying the dd->num_prstatus_notes counter. So then there's this: void * diskdump_get_prstatus_percpu(int cpu) { if ((cpu < 0) || (cpu >= dd->num_prstatus_notes)) return NULL; return dd->nt_prstatus_percpu[cpu]; } So in a case such as your example, where cpu 2 was the only cpu that saved its notes, the function above would incorrectly return NULL when called with diskdump_get_prstatus_percpu(2). What am I missing? Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility