invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"

Bob Montgomery <bob.montgomery@xxxxxx> · Tue, 10 Nov 2009 15:26:05 -0700

I have a dump from a 2.6.31-based x86_64 system where the number of
"possible" cpus equals the system's NR_CPUS (32).  

On that system, the __per_cpu_offset table in the kernel consists of 32
valid offset pointers.

When crash loads this table into its __per_cpu_offset[NR_CPUS=4096]
array in struct kernel_table, it knows the length of the kernel's array
(32*sizeof(long)), and copies the 32 pointers, leaving the rest of its
(much longer) array full of 0x0s.

(This happens in kernel.c)

 193      if (symbol_exists("__per_cpu_offset")) {
 194              if (LKCD_KERNTYPES())
 195                      i = get_cpus_possible();
 196              else
 197                      i = get_array_length("__per_cpu_offset", NULL, 0);
 198              get_symbol_data("__per_cpu_offset",
 199                      sizeof(long)*((i && (i <= NR_CPUS)) ? i : NR_CPUS),
 200                      &kt->__per_cpu_offset[0]);
 201              kt->flags |= PER_CPU_OFF;
 202      }

Later, in a couple of places, crash checks for the maximum valid
__per_cpu_offset by reading the cpu_number value out of each per_cpu
area and comparing it to the expected number until the comparison fails.
(Remember NR_CPUS in crash is much larger then the kernel's NR_CPUS, and
that's OK).

>From x86_64.c:
  
4201            for (i = cpus = 0; i < NR_CPUS; i++) {
4202                    readmem(symbol_value("per_cpu__cpu_number") +
4203                            kt->__per_cpu_offset[i], KVADDR,
4204                            &cpunumber, sizeof(int),
4205                            "cpu number (per_cpu)", FAULT_ON_ERROR);
4206                    if (cpunumber != cpus)
4207                            break;
4208                    cpus++;
4209            }

This works well when the kernel's array has fewer real per_cpu_offsets
than its own NR_CPUS, since the kernel preloads its array with a pointer
(BOOT_PERCPU_OFFSET) and when this loop runs past the real
per_cpu_offset pointers and tries to use the BOOT_PERCPU_OFFSET, it
reads a bogus value for cpunumber and terminates.

But when the kernel's table is full of valid per_cpu_offset pointers,
this loop continues off the end of that into the part of crash's
__per_cpu_offset array that has the 0x0 initial values, and dies with:

crash: invalid kernel virtual address: cc08  type: "cpu number
(per_cpu)"

The cc08 comes from the symbol_value of per_cpu__cpu_number:
000000000000cc08 D per_cpu__cpu_number

Bottom line:  Crash is assuming an insufficient array termination for
the kernel's __per_cpu_offset array (a pointer that points to an invalid
cpu_number).

The included patch adds an additional loop termination so that crash
doesn't run off the end of what it loaded from the dump.  It just checks
for a NULL 0x0 value in kt->__per_cpu_offset[i].

Bob Montgomery,
Working at HP

--- x86_64.c.orig	2009-11-10 10:43:54.000000000 -0700
+++ x86_64.c	2009-11-10 10:41:23.000000000 -0700
@@ -791,6 +791,8 @@ x86_64_per_cpu_init(void)
         ms = machdep->machspec;
 
 	for (i = cpus = 0; i < NR_CPUS; i++) {
+		if (kt->__per_cpu_offset[i] == NULL)
+			break;
 		readmem(symbol_value("per_cpu__cpu_number") + 
 			kt->__per_cpu_offset[i],
 			KVADDR, &cpunumber, sizeof(int),
@@ -4199,6 +4201,8 @@ x86_64_get_smp_cpus(void)
 			return 1;
 
 		for (i = cpus = 0; i < NR_CPUS; i++) {
+			if (kt->__per_cpu_offset[i] == NULL)
+				break;
 			readmem(symbol_value("per_cpu__cpu_number") + 
 				kt->__per_cpu_offset[i], KVADDR, 
 				&cpunumber, sizeof(int),
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility