On Mon, Jun 11, 2012 at 01:13:20PM +0800, Ming Lei wrote: > Firstly, .shutdown callback may touch a uninitialized hardware > if dev->driver is set and .probe is not completed. > > Secondly, device_shutdown() may dereference a null pointer to cause > oops when dev->driver is cleared after it is checked in > device_shutdown(). > > So just try to hold device lock and its parent lock(if it has) to > fix the races. > > Cc: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx Why stable? Are there known systems that crash right now without this change? I don't think we ever heard back from the original poster about this issue as to what exactly was going wrong. > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx> > --- > v2: > - take Alan's suggestion to use device_trylock to avoid > hanging during shutdown by buggy device or driver > - hold parent reference counter > > drivers/base/core.c | 32 ++++++++++++++++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index 346be8b..f2fc989 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -1796,6 +1796,16 @@ out: > } > EXPORT_SYMBOL_GPL(device_move); > > +static int __try_lock(struct device *dev) > +{ > + int i = 0; > + > + while (!device_trylock(dev) && i++ < 100) > + msleep(10); > + > + return i < 100; > +} That's a totally arbritary time, why does this work and other times do not? And what is this returning, if the lock was grabbed successfully? What's with the __ naming? I really don't like this at all. > + > /** > * device_shutdown - call ->shutdown() on each device to shutdown. > */ > @@ -1810,8 +1820,11 @@ void device_shutdown(void) > * devices offline, even as the system is shutting down. > */ > while (!list_empty(&devices_kset->list)) { > + int nonlocked; > + > dev = list_entry(devices_kset->list.prev, struct device, > kobj.entry); > + get_device(dev->parent); Why grab the parent reference? > get_device(dev); > /* > * Make sure the device is off the kset list, in the > @@ -1820,6 +1833,18 @@ void device_shutdown(void) > list_del_init(&dev->kobj.entry); > spin_unlock(&devices_kset->list_lock); > > + /* hold lock to avoid race with .probe/.release */ > + if (dev->parent && !__try_lock(dev->parent)) > + nonlocked = 2; > + else if (!__try_lock(dev)) > + nonlocked = 1; > + else > + nonlocked = 0; Ick ick ick. Why can't we just grab the lock to try to only call these callbacks one at a time? What is causing the big problem here that I am missing? > + > + if (nonlocked) > + dev_err(dev, "can't hold %slock for shutdown\n", > + nonlocked == 1 ? "" : "parent "); What can anyone do with this message? I sure wouldn't know what to do with it, do you? If so, what? greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html