Re: invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"

Dave Anderson <anderson@xxxxxxxxxx> · Wed, 11 Nov 2009 13:54:07 -0500 (EST)

----- "Bob Montgomery" <bob.montgomery@xxxxxx> wrote:

> On Wed, 2009-11-11 at 14:52 +0000, Dave Anderson wrote:
> > ----- "Bob Montgomery" <bob.montgomery@xxxxxx> wrote:
> > 
> > > I have a dump from a 2.6.31-based x86_64 system where the number of
> > > "possible" cpus equals the system's NR_CPUS (32).  
> > > On that system, the __per_cpu_offset table in the kernel consists of 32
> > > valid offset pointers.
> 
> > I have a similar-but-different fix queued for this, but instead of
> > checking for a NULL kt->__per_cpu_offset[i] entry, it changes the
> > readmem() call to RETURN_ON_ERROR|QUIET instead of FAULT_ON_ERROR
> > like this:
> > 
> >                 if (!readmem(symbol_value("per_cpu__cpu_number") +
> >                     kt->__per_cpu_offset[i],
> >                     KVADDR, &cpunumber, sizeof(int),
> >                     "cpu number (per_cpu)", QUIET|RETURN_ON_ERROR))
> >                         break;
> 
> > That should prevent the failure you're seeing.
> 
> I did that first, and thought it was sort of cheating :-)

Sort of.  But at that point in time we're still kind of blindly
wading around in the murk trying to figure out what we're 
running on...

> 
> > But another question is in the (extremely) rare circumstance of a
> > non-CONFIG_SMP kernel.  In that case, the kt->__per_cpu_offset[] array
> > would be all NULL, and the symbol_value("per_cpu__cpu_number")
> > call would return the qualified unity-mapped address.  So the
> > virtual address calculation should work in x86_64_per_cpu_init(),
> > and the loop wouldn't even be entered in x86_64_get_smp_cpus()
> > 
> > That being said, I don't think I've seen a recent x86_64 kernel
> > that was not compiled CONFIG_SMP, so I can't confirm that it's
> > ever been tested.  
> > 
> > So for sanity's sake, maybe your patch should also be applied,
> > but should also check if the "i" index is non-zero?
> 
> So like this?
> +               if (i && (kt->__per_cpu_offset[i] == NULL))
> +                       break;

Yes.

> 
> So it's always ok to try the readmem on the first element of
> the array.  And the RETURN_ON_ERROR would deal with something going
> wrong with that, although that case would presumably be a real
> problem with the dump, right?  (cpus == 0)

Most likely yes.  The motivation for my fix was due to a failure
attempting to readmem() a legitimate virtual address that was an
an excluded page from a makedumpfile-generated dump. If I recall
correctly, it was an in-house kexec-tools bugzilla, but I can't 
find it.

Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility