Re: [PATCH] USB: fix invalid memory access in hub_activate()

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Thu, 4 Aug 2016 11:20:15 -0400 (EDT)

Viresh, I believe this bug report indicates we need to use the 
alternate approach instead of your "usb: hub: Fix unbalanced reference 
count and memory leak" patch.

On Thu, 4 Aug 2016 mgautam@xxxxxxxxxxxxxx wrote:

> On 2015-12-17 00:02, Alan Stern wrote:
> > Commit 8520f38099cc ("USB: change hub initialization sleeps to
> > delayed_work") changed the hub_activate() routine to make part of it
> > run in a workqueue.  However, the commit failed to take a reference to
> > the usb_hub structure or to lock the hub interface while doing so.  As
> > a result, if a hub is plugged in and quickly unplugged before the work
> > routine can run, the routine will try to access memory that has been
> > deallocated.  Or, if the hub is unplugged while the routine is
> > running, the memory may be deallocated while it is in active use.
> > 
> > This patch fixes the problem by taking a reference to the usb_hub at
> > the start of hub_activate() and releasing it at the end (when the work
> > is finished), and by locking the hub interface while the work routine
> > is running.  It also adds a check at the start of the routine to see
> > if the hub has already been disconnected, in which nothing should be
> > done.
> 
 > Could this change result in below mutex lock that I am seeing:
> "kworker/3:2" call stack:
> {code}
> -000|__switch_to(prev = 0xFFFFFFC032EAAA00, next = 0xFFFFFFC075D59C00)
> -001|context_switch(inline)
> -001|__schedule()
> -002|schedule()
> -003|schedule_preempt_disabled()
> -004|__mutex_lock_common(inline)
> -004|__mutex_lock_slowpath(lock_count = 0xFFFFFFC07BE94580)
> -005|current_thread_info(inline)
> -005|mutex_set_owner(inline)
> -005|mutex_lock(
>      |    lock = 0xFFFFFFC00255EE90 -> (
>      |      count = (counter = -1),
>      |      wait_lock = (rlock = (raw_lock = (owner = 4, next = 4), magic 
> = 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF)),
>      |      wait_list = (next = 0xFFFFFFC0457DBC78, prev = 
> 0xFFFFFFC0457DBC78),
>      |      owner = 0xFFFFFFC075CD5400,
>      |      name = 0x0,
>      |      magic = 0xFFFFFFC00255EE90))
> -006|hub_activate(hub = 0xFFFFFFC00253A680, type = 472914991)
> -007|hub_init_func2(?)
> -008|static_key_count(inline)
> -008|static_key_false(inline)
> -008|trace_workqueue_execute_end(inline)
> -008|process_one_work(worker = 0xFFFFFFC017C8CA00, work = 
> 0xFFFFFFC00253A848)
> -009|worker_thread(__worker = 0xFFFFFFC017C8CA00)
> 
> 
> 
> "kworker/u16:0" call stack:
> {code}
> -000|__switch_to(prev = 0xFFFFFFC075CD5400, next = 0xFFFFFFC024510000)
> -001|context_switch(inline)
> -001|__schedule()
> -002|schedule()
> -003|schedule_timeout(timeout = -272798837376)
> -004|do_wait_for_common(inline)
> -004|__wait_for_common(inline)
> -004|wait_for_common(x = 0xFFFFFFC075D3F928, timeout = 
> 9223372036854775807, ?)
> -005|wait_for_completion(?)
> -006|flush_work(                                <---------- wq is 
> already blocked for the mutex above
>      |    work = 0xFFFFFFC00253A848 -> (
>      |      data = (counter = 417),
>      |      entry = (next = 0xFFFFFFC00253A850, prev = 
> 0xFFFFFFC00253A850),
>      |      func = 0xFFFFFFC000659B20 -> ))
>      |  barr = (work = (data = (counter = -272799003163), entry = (next = 
> 0xFFFFFFC017C8CA30, prev = 0xFFFFFFC017C8CA30), func =
>      |  pool = 0xFFFFFFC0703784C0
>      |  pwq = 0xFFFFFFC07BE98C00
>      |  linked = 26644480
>      |  __key = ()
> -007|clear_work_data(inline)
> -007|__cancel_work_timer(work = 0xFFFFFFC00253A848, is_dwork = FALSE)
> -008|cancel_delayed_work_sync(?)
> -009|hub_quiesce(hub = 0xFFFFFFC00253A680, ?)
> -010|hub_disconnect(intf = 0xFFFFFFC00255EE00)
> -011|usb_unbind_interface(dev = 0xFFFFFFC00255EE30)
> -012|__device_release_driver(dev = 0xCB88537FDC8CAE50)
> -013|device_unlock(inline)
> -013|device_release_driver(dev = 0xFFFFFFC00255EE30)
> -014|bus_remove_device(dev = 0xFFFFFFC00255EE30)
> -015|device_del(dev = 0xFFFFFFC00255EE30)
> -016|usb_disable_device(dev = 0xFFFFFFC032C42600, skip_ep0 = 0)
> -017|usb_disconnect(pdev = 0xFFFFFFC075D3FCA0)
> -018|usb_remove_hcd(hcd = 0xFFFFFFC0700AE880)

Yes, it could lead to this deadlock.

> Should we rather have device_trylock instead of device_lock

No.

> > @@ -1031,10 +1031,20 @@ static void hub_activate(struct usb_hub
> >  	unsigned delay;
> > 
> >  	/* Continue a partial initialization */
> > -	if (type == HUB_INIT2)
> > -		goto init2;
> > -	if (type == HUB_INIT3)
> > +	if (type == HUB_INIT2 || type == HUB_INIT3) {
> > +		device_lock(hub->intfdev);
> 
> device_trylock?? If failed to acquire then treat that as disconnect?

No, that's not the right answer.  I believe the correct solution is to
remove the cancel_delayed_work_sync() call in hub_quiesce().

Viresh, do you agree?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html