Re: [PATCH 2/2] USB: hub: change the locking in hub_activate

Viresh Kumar <viresh.kumar@xxxxxxxxxx> · Thu, 1 Sep 2016 10:32:41 +0530

On 31-08-16, 12:46, Alan Stern wrote:
> On Wed, 31 Aug 2016, Viresh Kumar wrote:
> 
> > On 05-08-16, 11:51, Alan Stern wrote:
> > > +++ usb-4.x/drivers/usb/core/hub.c
> > > @@ -1052,7 +1052,7 @@ static void hub_activate(struct usb_hub
> > >  
> > >  	/* Continue a partial initialization */
> > >  	if (type == HUB_INIT2 || type == HUB_INIT3) {
> > > -		device_lock(hub->intfdev);
> > > +		device_lock(&hdev->dev);
> > 
> > Hi Alan,
> > 
> > I have received reports of kernel crashes (NULL dereference) due to this patch
> > in some of the corner cases. Note that we have backported this patch (and few
> > other) to 3.10 kernel. I have attached my hub.c file as well for reference.
> > 
> > Here is the reported kernel OOPs:
> > 
> > [   19.476228] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > [   19.476231] pgd = ffffffc00007d000
> > [   19.476236] [00000000] *pgd=000000000e90b003, *pmd=000000000e90c003, *pte=00e00000f9000407
> > [   19.476242] Internal error: Oops: 96000045 [#1] PREEMPT SMP
> > [   19.476273] Modules linked in: gb_vibrator(O) gb_usb(O) gb_uart(O) gb_spi(O) gb_sdio(O) gb_raw(O) gb_pwm(O) gb_power_supply(O) gb)
> > [   19.476279] CPU: 0 PID: 344 Comm: kworker/0:3 Tainted: G           O 3.10.97-g4b7224f-dirty #454
> > [   19.476290] Workqueue: events hub_init_func2
> > [   19.476293] task: ffffffc09b3560c0 ti: ffffffc09ada8000 task.ti: ffffffc09ada8000
> > [   19.476300] PC is at __mutex_lock_slowpath+0x138/0x224
> > [   19.476303] LR is at __mutex_lock_slowpath+0x128/0x224
> ...
> > [   19.476582] Call trace:
> > [   19.476586] [<ffffffc000ccf13c>] __mutex_lock_slowpath+0x138/0x224
> > [   19.476590] [<ffffffc000ccf254>] mutex_lock+0x2c/0x48
> > [   19.476593] [<ffffffc0006f6eac>] hub_activate+0x50/0x4d8
> > [   19.476596] [<ffffffc0006f7388>] hub_init_func2+0x14/0x1c
> > [   19.476602] [<ffffffc0002387ac>] process_one_work+0x26c/0x3cc
> > [   19.476605] [<ffffffc000239988>] worker_thread+0x208/0x358
> > [   19.476610] [<ffffffc00023f360>] kthread+0xbc/0xc4
> ...
> > This happens when the device is infinitely generating connected and then removed
> > (not manually, but due to some hardware issues).
> 
> If I'm reading this right, it means that hub->hdev is NULL in

I am not sure I am in sync here :(

> hub_activate().

We would have gotten the crash right from hub_activate() in that case, isn't it?

The fact that the call sequence reached mutex_lock() here, it means that
hub->hdev->dev was valid at least. The mutex dev->mutex is somewhat corrupted or
uninitialized, etc.. And that's where it all went wrong. As
__mutex_lock_slowpath() is called, it means that the mutex had a count of 0
instead of 1 during the lock and then we crashed during __mutex_lock_slowpath(),
which can only happen if the mutex is uninitialized in the first place.

The mutex gets initialized as part of device_add() and so things can go wrong if
device_add() was skipped here somehow.

I may be completely wrong, but that's what I read :)

> I don't see how that could be true; it is initialized
> to a non-NULL value and then never changed until the hub structure is
> deallocated.
> 
> Is it possible to add some debugging printk's in there to find out 
> what's happening?

I will ask Vaibhav (cc'd) to do it, not sure if he is around today or not.

Thanks for your quick reply.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html