----- "Bob Montgomery" <bob.montgomery@xxxxxx> wrote: > On Wed, 2009-11-11 at 14:52 +0000, Dave Anderson wrote: > > ----- "Bob Montgomery" <bob.montgomery@xxxxxx> wrote: > > > > > I have a dump from a 2.6.31-based x86_64 system where the number of > > > "possible" cpus equals the system's NR_CPUS (32). > > > On that system, the __per_cpu_offset table in the kernel consists of 32 > > > valid offset pointers. > > > I have a similar-but-different fix queued for this, but instead of > > checking for a NULL kt->__per_cpu_offset[i] entry, it changes the > > readmem() call to RETURN_ON_ERROR|QUIET instead of FAULT_ON_ERROR > > like this: > > > > if (!readmem(symbol_value("per_cpu__cpu_number") + > > kt->__per_cpu_offset[i], > > KVADDR, &cpunumber, sizeof(int), > > "cpu number (per_cpu)", QUIET|RETURN_ON_ERROR)) > > break; > > > That should prevent the failure you're seeing. > > I did that first, and thought it was sort of cheating :-) Sort of. But at that point in time we're still kind of blindly wading around in the murk trying to figure out what we're running on... > > > But another question is in the (extremely) rare circumstance of a > > non-CONFIG_SMP kernel. In that case, the kt->__per_cpu_offset[] array > > would be all NULL, and the symbol_value("per_cpu__cpu_number") > > call would return the qualified unity-mapped address. So the > > virtual address calculation should work in x86_64_per_cpu_init(), > > and the loop wouldn't even be entered in x86_64_get_smp_cpus() > > > > That being said, I don't think I've seen a recent x86_64 kernel > > that was not compiled CONFIG_SMP, so I can't confirm that it's > > ever been tested. > > > > So for sanity's sake, maybe your patch should also be applied, > > but should also check if the "i" index is non-zero? > > So like this? > + if (i && (kt->__per_cpu_offset[i] == NULL)) > + break; Yes. > > So it's always ok to try the readmem on the first element of > the array. And the RETURN_ON_ERROR would deal with something going > wrong with that, although that case would presumably be a real > problem with the dump, right? (cpus == 0) Most likely yes. The motivation for my fix was due to a failure attempting to readmem() a legitimate virtual address that was an an excluded page from a makedumpfile-generated dump. If I recall correctly, it was an in-house kexec-tools bugzilla, but I can't find it. Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility