Re: mmotm 2009-07-16-14-32 - sudden OOPS at boot in ACPI code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Hugh Dickins <hugh.dickins@xxxxxxxxxxxxx>
> Date: Tue, Jul 21, 2009 at 11:33 AM
> Subject: Re: mmotm 2009-07-16-14-32 - sudden OOPS at boot in ACPI code
> To: Valdis.Kletnieks@xxxxxx
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Bob Moore
> <robert.moore@xxxxxxxxx>, Len Brown <lenb@xxxxxxxxxx>,
> linux-kernel@xxxxxxxxxxxxxxx, linux-acpi@xxxxxxxxxxxxxxx
> 
> 
> On Mon, 20 Jul 2009, Valdis.Kletnieks@xxxxxx wrote:
> > On Thu, 16 Jul 2009 14:34:02 PDT, akpm@xxxxxxxxxxxxxxxxxxxx said:
> > > The mm-of-the-moment snapshot 2009-07-16-14-32 has been uploaded to
> >
> > Dies a horrid death during early boot. Dell Latitude D820, and this graphics:
> >
> > 01:00.0 VGA compatible controller: nVidia Corporation G72M [Quadro NVS 110M/GeForce Go 7300] (rev a1)
> 
> Oh yes, I was getting just the same with Intel graphics (i915);
> but promptly forgot about it once I'd a workaround in place,
> and moved on to other things, sorry.
> 
> >
> > Traceback (hand-copied from a very crappy cell-phone picture)
> >
> > strcmp+0x4/0x1f
> > acpi_device+probe+0xac/0x13e
> > driver_probe_device+0xc9/0x14e
> > __driver_attach+0x58/0x7c
> > ? __driver_attach+0x58/0x7c
> > ? __driver_attach+0x58/0x7c
> > bus_for_each_dev+0x54/0x89
> > driver_attach+0x19/0x1b
> > bus_add_driver+0xv4/0x1fe
> > driver_register+0xb7/0x128
> > ? acpi_video_init+0x0/0x17
> > acpi_bus_register_driver+0x3e/0x42
> > acpi_video_register+0x42/0x6e
> > acpi_video_init+0x15/0x17
> > do_one_initcall+0x56/0x130
> >
> > Analysis shows it's the following code from (inlined) acpi_device_install_notify_handler
> >
> > static int acpi_device_install_notify_handler(struct acpi_device *device)
> > {
> >         acpi_status status;
> >         char *hid;
> >
> >         hid = acpi_device_hid(device);
> >         if (!strcmp(hid, ACPI_BUTTON_HID_POWERF))
> >
> > but we never check if hid is non-trash before feeding it to strcmp.  Looks
> > like something in this linux-next commit is involved:
> >
> > commit ed444824932d2a563858d82ec1ea29b0aa775e91
> > Author: Bob Moore <robert.moore@xxxxxxxxx>
> > Date:   Mon Jun 29 13:39:29 2009 +0800
> >
> > I suspect something in acpi_get_object_info() is going astray, causing
> > acpi_device_set_id() to set the ->pnp.hardware_id to NULL in this code:
> >
> >         if (hid) {
> >                 device->pnp.hardware_id = ACPI_ALLOCATE_ZEROED(strlen (hid) + 1);
> >                 if (device->pnp.hardware_id) {
> >                         strcpy(device->pnp.hardware_id, hid);
> >                         device->flags.hardware_id = 1;
> >                 }
> >         } else
> >                 device->pnp.hardware_id = NULL;
> >
> > The else clause is new in this commit.
> 
> I think pnp.hardware_id has changed from being a builtin array to
> an allocated pointer: so before there was always a zeroed array to

Yes, pnp.hardware_id and pnp.unique_id are now allocated pointer.
We made the change for acpi_get_object_info interface.

> strcmp against, whereas now there's a NULL pointer if you come to
> use acpi_device_install_notify_handler() "too early".
> 
> Patch that works for me at the bottom.

Yes, 
your patch can workaround the problem in
acpi_device_install_notify_handler.

But there are other places call strcmp to compare HID/UID.
So we'd better fix acpi_device_hid/_uid as below,

diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 6e83a68..6c64366 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -188,8 +188,8 @@ struct acpi_device_pnp {
 
 #define acpi_device_bid(d)	((d)->pnp.bus_id)
 #define acpi_device_adr(d)	((d)->pnp.bus_address)
-#define acpi_device_hid(d)	((d)->pnp.hardware_id)
-#define acpi_device_uid(d)	((d)->pnp.unique_id)
+#define acpi_device_hid(d)	((d)->pnp.hardware_id ? (d)->pnp.hardware_id : "\0")
+#define acpi_device_uid(d)	((d)->pnp.unique_id ? (d)->pnp.unique_id : "\0")
 #define acpi_device_name(d)	((d)->pnp.device_name)
 #define acpi_device_class(d)	((d)->pnp.device_class)
 
---
Thanks,
Lin Ming

> 
> >
> > Looking at the old code, it *may* be that the ACPI code on my laptop is just
> > busticated and/or there's no _HID method for the graphics card, and the old
> > code Just Happened To Work in previous kernels because ->pnp.hardware_id
> > wouldn't actually get set *at all* in acpi_device_set_id, so we'd get random
> > stale data that was bogus, but didn't give strcmp() indigestion...
> >
> > Any wisdom on debugging this further (including how to tell if the ACPI
> > tables have a sane _HID method for the graphics card) would be appreciated...
> >
> > Or is the correct fix in fact to just add a 'if (!hid) return -EINVAL;' to
> > acpi_device_install_notify_handler()?
> 
> [PATCH mmotm] acpi: work around NULL hardware_id
> 
> Work around NULL pnp.hardware_id in acpi_device_install_notify_handler()
> when probing video device.
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@xxxxxxxxxxxxx>
> ---
> Signoff provided to handle the unlikely event that this hack
> is actually the right fix!
> 
>  drivers/acpi/scan.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> --- mmotm/drivers/acpi/scan.c   2009-07-17 12:53:20.000000000 +0100
> +++ linux/drivers/acpi/scan.c   2009-07-17 21:19:10.000000000 +0100
> @@ -376,12 +376,12 @@ static int acpi_device_install_notify_ha
>        char *hid;
> 
>        hid = acpi_device_hid(device);
> -       if (!strcmp(hid, ACPI_BUTTON_HID_POWERF))
> +       if (hid && !strcmp(hid, ACPI_BUTTON_HID_POWERF))
>                status =
>                    acpi_install_fixed_event_handler(ACPI_EVENT_POWER_BUTTON,
>                                                     acpi_device_notify_fixed,
>                                                     device);
> -       else if (!strcmp(hid, ACPI_BUTTON_HID_SLEEPF))
> +       else if (hid && !strcmp(hid, ACPI_BUTTON_HID_SLEEPF))
>                status =
>                    acpi_install_fixed_event_handler(ACPI_EVENT_SLEEP_BUTTON,
>                                                     acpi_device_notify_fixed,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux