[REGRESSION] Minimize per_cpu reservations patch causes NULL ptr deref when some CPUs are offline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At HP we ran into a regression that appears to be caused by:

commit 2c6e6db41f01b6b4eb98809350827c9678996698
Author: holt@xxxxxxx <holt@xxxxxxx>
Date:   Thu Apr 3 15:17:13 2008 -0500

    [IA64] Minimize per_cpu reservations.


This is seen when we have one or more CPUs deconfigured via "cpuconfig"
at the EFI shell or on systems that have an unpopulated CPU socket.  We
hit this traceback (edited for clarity)

Unable to handle kernel NULL pointer dereference (address 0000000000000040)
 [<a000000100618bc0>] _spin_lock+0x20/0x60
 [<a0000001000811b0>] __enable_runtime+0x70/0x160
 [<a000000100086a50>] rq_online_rt+0x110/0x180
 [<a00000010007e070>] set_rq_online+0x110/0x160
 [<a00000010060db90>] migration_call+0x290/0xc00
 [<a000000100807060>] migration_init+0xc0/0x100
 [<a00000010000a960>] do_one_initcall+0xa0/0x2a0
 [<a0000001007f0310>] kernel_init+0xd0/0x5a0


The problem is the per-cpu datastructures are used but not set up for
the offline CPUs.

in setup_arch() we call this bit of code which should add the offline CPUs to
early_cpu_possible_mask:


        per_cpu_scan_finalize((cpus_weight(early_cpu_possible_map) == 0 ?
                32 : cpus_weight(early_cpu_possible_map)),
                additional_cpus > 0 ? additional_cpus : 0);


however, additional_cpus has not been set yet.  It _can_ be set by the user as
a command line parameter however it is normally set by prefill_possible_map()
which is called later.  At the point where per_cpu_scan_finalize is called we
_cannot_ know what additional_cpus is because we have not yet gotten that info
from acpi.  That happens when we call acpi_boot_init() which hapens later in
setup_arch().

It appears the intent of calling per_cpu_scan_finalize() at the current
location is that it needs to be called prior to find_memory().  The "Minimize
per_cpu reservations." patch changes code in arch/ia64/mm/discontig.c which
sets up the per-cpu datastructures to only set those up for CPUs in
early_cpu_possible_map.  The problem is we don't have the offline CPUs in that
map yet.  Later code tries to use these datastructures and we hit the NULL ptr.

So, it seems the options to fix this that I can think of are:

1) try to get the info regarding offline CPUs earlier either by calling
acpi_boot_init() earlier (probably not possible prior to calling find_memory())
or walk a subset of the ACPI tables to get the number of offline CPUs sooner
(which sounds ugly).


2) after we call acpi_boot_init() go back and setup the per-cpu datastructures
for the offline CPUs then.  Seems like this might be cleaner but I have not
investigated the specifics.


I will start looking at option #2 but certainly welcome suggestions, perhaps
there is an easy fix?

thanks,

- Doug


--
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel]     [Sparc Linux]     [DCCP]     [Linux ARM]     [Yosemite News]     [Linux SCSI]     [Linux x86_64]     [Linux for Ham Radio]

  Powered by Linux