> From: Hugh Dickins <hugh.dickins@xxxxxxxxxxxxx> > Date: Tue, Jul 21, 2009 at 11:33 AM > Subject: Re: mmotm 2009-07-16-14-32 - sudden OOPS at boot in ACPI code > To: Valdis.Kletnieks@xxxxxx > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Bob Moore > <robert.moore@xxxxxxxxx>, Len Brown <lenb@xxxxxxxxxx>, > linux-kernel@xxxxxxxxxxxxxxx, linux-acpi@xxxxxxxxxxxxxxx > > > On Mon, 20 Jul 2009, Valdis.Kletnieks@xxxxxx wrote: > > On Thu, 16 Jul 2009 14:34:02 PDT, akpm@xxxxxxxxxxxxxxxxxxxx said: > > > The mm-of-the-moment snapshot 2009-07-16-14-32 has been uploaded to > > > > Dies a horrid death during early boot. Dell Latitude D820, and this graphics: > > > > 01:00.0 VGA compatible controller: nVidia Corporation G72M [Quadro NVS 110M/GeForce Go 7300] (rev a1) > > Oh yes, I was getting just the same with Intel graphics (i915); > but promptly forgot about it once I'd a workaround in place, > and moved on to other things, sorry. > > > > > Traceback (hand-copied from a very crappy cell-phone picture) > > > > strcmp+0x4/0x1f > > acpi_device+probe+0xac/0x13e > > driver_probe_device+0xc9/0x14e > > __driver_attach+0x58/0x7c > > ? __driver_attach+0x58/0x7c > > ? __driver_attach+0x58/0x7c > > bus_for_each_dev+0x54/0x89 > > driver_attach+0x19/0x1b > > bus_add_driver+0xv4/0x1fe > > driver_register+0xb7/0x128 > > ? acpi_video_init+0x0/0x17 > > acpi_bus_register_driver+0x3e/0x42 > > acpi_video_register+0x42/0x6e > > acpi_video_init+0x15/0x17 > > do_one_initcall+0x56/0x130 > > > > Analysis shows it's the following code from (inlined) acpi_device_install_notify_handler > > > > static int acpi_device_install_notify_handler(struct acpi_device *device) > > { > > acpi_status status; > > char *hid; > > > > hid = acpi_device_hid(device); > > if (!strcmp(hid, ACPI_BUTTON_HID_POWERF)) > > > > but we never check if hid is non-trash before feeding it to strcmp. Looks > > like something in this linux-next commit is involved: > > > > commit ed444824932d2a563858d82ec1ea29b0aa775e91 > > Author: Bob Moore <robert.moore@xxxxxxxxx> > > Date: Mon Jun 29 13:39:29 2009 +0800 > > > > I suspect something in acpi_get_object_info() is going astray, causing > > acpi_device_set_id() to set the ->pnp.hardware_id to NULL in this code: > > > > if (hid) { > > device->pnp.hardware_id = ACPI_ALLOCATE_ZEROED(strlen (hid) + 1); > > if (device->pnp.hardware_id) { > > strcpy(device->pnp.hardware_id, hid); > > device->flags.hardware_id = 1; > > } > > } else > > device->pnp.hardware_id = NULL; > > > > The else clause is new in this commit. > > I think pnp.hardware_id has changed from being a builtin array to > an allocated pointer: so before there was always a zeroed array to Yes, pnp.hardware_id and pnp.unique_id are now allocated pointer. We made the change for acpi_get_object_info interface. > strcmp against, whereas now there's a NULL pointer if you come to > use acpi_device_install_notify_handler() "too early". > > Patch that works for me at the bottom. Yes, your patch can workaround the problem in acpi_device_install_notify_handler. But there are other places call strcmp to compare HID/UID. So we'd better fix acpi_device_hid/_uid as below, diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 6e83a68..6c64366 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -188,8 +188,8 @@ struct acpi_device_pnp { #define acpi_device_bid(d) ((d)->pnp.bus_id) #define acpi_device_adr(d) ((d)->pnp.bus_address) -#define acpi_device_hid(d) ((d)->pnp.hardware_id) -#define acpi_device_uid(d) ((d)->pnp.unique_id) +#define acpi_device_hid(d) ((d)->pnp.hardware_id ? (d)->pnp.hardware_id : "\0") +#define acpi_device_uid(d) ((d)->pnp.unique_id ? (d)->pnp.unique_id : "\0") #define acpi_device_name(d) ((d)->pnp.device_name) #define acpi_device_class(d) ((d)->pnp.device_class) --- Thanks, Lin Ming > > > > > Looking at the old code, it *may* be that the ACPI code on my laptop is just > > busticated and/or there's no _HID method for the graphics card, and the old > > code Just Happened To Work in previous kernels because ->pnp.hardware_id > > wouldn't actually get set *at all* in acpi_device_set_id, so we'd get random > > stale data that was bogus, but didn't give strcmp() indigestion... > > > > Any wisdom on debugging this further (including how to tell if the ACPI > > tables have a sane _HID method for the graphics card) would be appreciated... > > > > Or is the correct fix in fact to just add a 'if (!hid) return -EINVAL;' to > > acpi_device_install_notify_handler()? > > [PATCH mmotm] acpi: work around NULL hardware_id > > Work around NULL pnp.hardware_id in acpi_device_install_notify_handler() > when probing video device. > > Signed-off-by: Hugh Dickins <hugh.dickins@xxxxxxxxxxxxx> > --- > Signoff provided to handle the unlikely event that this hack > is actually the right fix! > > drivers/acpi/scan.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- mmotm/drivers/acpi/scan.c 2009-07-17 12:53:20.000000000 +0100 > +++ linux/drivers/acpi/scan.c 2009-07-17 21:19:10.000000000 +0100 > @@ -376,12 +376,12 @@ static int acpi_device_install_notify_ha > char *hid; > > hid = acpi_device_hid(device); > - if (!strcmp(hid, ACPI_BUTTON_HID_POWERF)) > + if (hid && !strcmp(hid, ACPI_BUTTON_HID_POWERF)) > status = > acpi_install_fixed_event_handler(ACPI_EVENT_POWER_BUTTON, > acpi_device_notify_fixed, > device); > - else if (!strcmp(hid, ACPI_BUTTON_HID_SLEEPF)) > + else if (hid && !strcmp(hid, ACPI_BUTTON_HID_SLEEPF)) > status = > acpi_install_fixed_event_handler(ACPI_EVENT_SLEEP_BUTTON, > acpi_device_notify_fixed, > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html