Re: [PATCH 2/2] USB: hub: change the locking in hub_activate

Vaibhav Hiremath <vaibhav.hiremath@xxxxxxxxxx> · Thu, 1 Sep 2016 21:57:07 +0530






On Thursday 01 September 2016 07:59 PM, Alan Stern wrote:
On Thu, 1 Sep 2016, Viresh Kumar wrote:

On 31-08-16, 12:46, Alan Stern wrote:
On Wed, 31 Aug 2016, Viresh Kumar wrote:

On 05-08-16, 11:51, Alan Stern wrote:
+++ usb-4.x/drivers/usb/core/hub.c
@@ -1052,7 +1052,7 @@ static void hub_activate(struct usb_hub
  
  	/* Continue a partial initialization */
  	if (type == HUB_INIT2 || type == HUB_INIT3) {
-		device_lock(hub->intfdev);
+		device_lock(&hdev->dev);
Hi Alan,

I have received reports of kernel crashes (NULL dereference) due to this patch
in some of the corner cases. Note that we have backported this patch (and few
other) to 3.10 kernel. I have attached my hub.c file as well for reference.

Here is the reported kernel OOPs:

[   19.476228] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   19.476231] pgd = ffffffc00007d000
[   19.476236] [00000000] *pgd=000000000e90b003, *pmd=000000000e90c003, *pte=00e00000f9000407
[   19.476242] Internal error: Oops: 96000045 [#1] PREEMPT SMP
[   19.476273] Modules linked in: gb_vibrator(O) gb_usb(O) gb_uart(O) gb_spi(O) gb_sdio(O) gb_raw(O) gb_pwm(O) gb_power_supply(O) gb)
[   19.476279] CPU: 0 PID: 344 Comm: kworker/0:3 Tainted: G           O 3.10.97-g4b7224f-dirty #454
[   19.476290] Workqueue: events hub_init_func2
[   19.476293] task: ffffffc09b3560c0 ti: ffffffc09ada8000 task.ti: ffffffc09ada8000
[   19.476300] PC is at __mutex_lock_slowpath+0x138/0x224
[   19.476303] LR is at __mutex_lock_slowpath+0x128/0x224
...
[   19.476582] Call trace:
[   19.476586] [<ffffffc000ccf13c>] __mutex_lock_slowpath+0x138/0x224
[   19.476590] [<ffffffc000ccf254>] mutex_lock+0x2c/0x48
[   19.476593] [<ffffffc0006f6eac>] hub_activate+0x50/0x4d8
[   19.476596] [<ffffffc0006f7388>] hub_init_func2+0x14/0x1c
[   19.476602] [<ffffffc0002387ac>] process_one_work+0x26c/0x3cc
[   19.476605] [<ffffffc000239988>] worker_thread+0x208/0x358
[   19.476610] [<ffffffc00023f360>] kthread+0xbc/0xc4
...
This happens when the device is infinitely generating connected and then removed
(not manually, but due to some hardware issues).
If I'm reading this right, it means that hub->hdev is NULL in
I am not sure I am in sync here :(

hub_activate().
We would have gotten the crash right from hub_activate() in that case, isn't it?

The fact that the call sequence reached mutex_lock() here, it means that
hub->hdev->dev was valid at least. The mutex dev->mutex is somewhat corrupted or
uninitialized, etc..
Ah, that makes a lot more sense.  Thanks for staightening me out.  And
some pointer embedded in the mutex must have been NULL.

  And that's where it all went wrong. As
__mutex_lock_slowpath() is called, it means that the mutex had a count of 0
instead of 1 during the lock and then we crashed during __mutex_lock_slowpath(),
which can only happen if the mutex is uninitialized in the first place.

The mutex gets initialized as part of device_add() and so things can go wrong if
device_add() was skipped here somehow.

I may be completely wrong, but that's what I read :)
Okay.  But I don't see how device_add() could have been skipped.  The
hub_init_func2() call occurs after the original HUB_INIT hub_activate()
call, which is in hub_configure() and occurs during probing.  The hub
interface doesn't get probed until the hub device is registered (the
hub interface is a child of the hub device).

Another possibility is that the mutex _was_ initialized but got
corrupted somehow.  Of course, that kind of thing is very hard to track
down.


On Thu, 1 Sep 2016, Vaibhav Hiremath wrote:

I have some more update on this,

It seems the culprit was my laptop USB port (I have to say bad port),
which resulted into continuous connect/disconnect event, on both
laptop and phone (Android), which eventually resulted into kernel panic.
Well, the bad port was the trigger.  But even with a bad port, the
kernel should not panic.

Yes, certainly kernel should not panic.
It could be a memory corruption or race somewhere (not sure though),
which is where device is not initialized properly when execution reached
to hub_activate().

I connected phone to another USB port and things started working as
expected.

I can put prints to trace the execution flow, lets see what comes out
from it.
How easily can you reproduce the problem?

Very easily. I would say 7-8 times out of 10.

Thanks,
Vaibhav
Alan Stern


--
Thanks,
Vaibhav

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html