On Thu, 1 Sep 2016, Viresh Kumar wrote: > On 31-08-16, 12:46, Alan Stern wrote: > > On Wed, 31 Aug 2016, Viresh Kumar wrote: > > > > > On 05-08-16, 11:51, Alan Stern wrote: > > > > +++ usb-4.x/drivers/usb/core/hub.c > > > > @@ -1052,7 +1052,7 @@ static void hub_activate(struct usb_hub > > > > > > > > /* Continue a partial initialization */ > > > > if (type == HUB_INIT2 || type == HUB_INIT3) { > > > > - device_lock(hub->intfdev); > > > > + device_lock(&hdev->dev); > > > > > > Hi Alan, > > > > > > I have received reports of kernel crashes (NULL dereference) due to this patch > > > in some of the corner cases. Note that we have backported this patch (and few > > > other) to 3.10 kernel. I have attached my hub.c file as well for reference. > > > > > > Here is the reported kernel OOPs: > > > > > > [ 19.476228] Unable to handle kernel NULL pointer dereference at virtual address 00000000 > > > [ 19.476231] pgd = ffffffc00007d000 > > > [ 19.476236] [00000000] *pgd=000000000e90b003, *pmd=000000000e90c003, *pte=00e00000f9000407 > > > [ 19.476242] Internal error: Oops: 96000045 [#1] PREEMPT SMP > > > [ 19.476273] Modules linked in: gb_vibrator(O) gb_usb(O) gb_uart(O) gb_spi(O) gb_sdio(O) gb_raw(O) gb_pwm(O) gb_power_supply(O) gb) > > > [ 19.476279] CPU: 0 PID: 344 Comm: kworker/0:3 Tainted: G O 3.10.97-g4b7224f-dirty #454 > > > [ 19.476290] Workqueue: events hub_init_func2 > > > [ 19.476293] task: ffffffc09b3560c0 ti: ffffffc09ada8000 task.ti: ffffffc09ada8000 > > > [ 19.476300] PC is at __mutex_lock_slowpath+0x138/0x224 > > > [ 19.476303] LR is at __mutex_lock_slowpath+0x128/0x224 > > ... > > > [ 19.476582] Call trace: > > > [ 19.476586] [<ffffffc000ccf13c>] __mutex_lock_slowpath+0x138/0x224 > > > [ 19.476590] [<ffffffc000ccf254>] mutex_lock+0x2c/0x48 > > > [ 19.476593] [<ffffffc0006f6eac>] hub_activate+0x50/0x4d8 > > > [ 19.476596] [<ffffffc0006f7388>] hub_init_func2+0x14/0x1c > > > [ 19.476602] [<ffffffc0002387ac>] process_one_work+0x26c/0x3cc > > > [ 19.476605] [<ffffffc000239988>] worker_thread+0x208/0x358 > > > [ 19.476610] [<ffffffc00023f360>] kthread+0xbc/0xc4 > > ... > > > This happens when the device is infinitely generating connected and then removed > > > (not manually, but due to some hardware issues). > > > > If I'm reading this right, it means that hub->hdev is NULL in > > I am not sure I am in sync here :( > > > hub_activate(). > > We would have gotten the crash right from hub_activate() in that case, isn't it? > > The fact that the call sequence reached mutex_lock() here, it means that > hub->hdev->dev was valid at least. The mutex dev->mutex is somewhat corrupted or > uninitialized, etc.. Ah, that makes a lot more sense. Thanks for staightening me out. And some pointer embedded in the mutex must have been NULL. > And that's where it all went wrong. As > __mutex_lock_slowpath() is called, it means that the mutex had a count of 0 > instead of 1 during the lock and then we crashed during __mutex_lock_slowpath(), > which can only happen if the mutex is uninitialized in the first place. > > The mutex gets initialized as part of device_add() and so things can go wrong if > device_add() was skipped here somehow. > > I may be completely wrong, but that's what I read :) Okay. But I don't see how device_add() could have been skipped. The hub_init_func2() call occurs after the original HUB_INIT hub_activate() call, which is in hub_configure() and occurs during probing. The hub interface doesn't get probed until the hub device is registered (the hub interface is a child of the hub device). Another possibility is that the mutex _was_ initialized but got corrupted somehow. Of course, that kind of thing is very hard to track down. On Thu, 1 Sep 2016, Vaibhav Hiremath wrote: > I have some more update on this, > > It seems the culprit was my laptop USB port (I have to say bad port), > which resulted into continuous connect/disconnect event, on both > laptop and phone (Android), which eventually resulted into kernel panic. Well, the bad port was the trigger. But even with a bad port, the kernel should not panic. > It could be a memory corruption or race somewhere (not sure though), > which is where device is not initialized properly when execution reached > to hub_activate(). > > I connected phone to another USB port and things started working as > expected. > > I can put prints to trace the execution flow, lets see what comes out > from it. How easily can you reproduce the problem? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html