Viresh, I believe this bug report indicates we need to use the alternate approach instead of your "usb: hub: Fix unbalanced reference count and memory leak" patch. On Thu, 4 Aug 2016 mgautam@xxxxxxxxxxxxxx wrote: > On 2015-12-17 00:02, Alan Stern wrote: > > Commit 8520f38099cc ("USB: change hub initialization sleeps to > > delayed_work") changed the hub_activate() routine to make part of it > > run in a workqueue. However, the commit failed to take a reference to > > the usb_hub structure or to lock the hub interface while doing so. As > > a result, if a hub is plugged in and quickly unplugged before the work > > routine can run, the routine will try to access memory that has been > > deallocated. Or, if the hub is unplugged while the routine is > > running, the memory may be deallocated while it is in active use. > > > > This patch fixes the problem by taking a reference to the usb_hub at > > the start of hub_activate() and releasing it at the end (when the work > > is finished), and by locking the hub interface while the work routine > > is running. It also adds a check at the start of the routine to see > > if the hub has already been disconnected, in which nothing should be > > done. > > Could this change result in below mutex lock that I am seeing: > "kworker/3:2" call stack: > {code} > -000|__switch_to(prev = 0xFFFFFFC032EAAA00, next = 0xFFFFFFC075D59C00) > -001|context_switch(inline) > -001|__schedule() > -002|schedule() > -003|schedule_preempt_disabled() > -004|__mutex_lock_common(inline) > -004|__mutex_lock_slowpath(lock_count = 0xFFFFFFC07BE94580) > -005|current_thread_info(inline) > -005|mutex_set_owner(inline) > -005|mutex_lock( > | lock = 0xFFFFFFC00255EE90 -> ( > | count = (counter = -1), > | wait_lock = (rlock = (raw_lock = (owner = 4, next = 4), magic > = 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF)), > | wait_list = (next = 0xFFFFFFC0457DBC78, prev = > 0xFFFFFFC0457DBC78), > | owner = 0xFFFFFFC075CD5400, > | name = 0x0, > | magic = 0xFFFFFFC00255EE90)) > -006|hub_activate(hub = 0xFFFFFFC00253A680, type = 472914991) > -007|hub_init_func2(?) > -008|static_key_count(inline) > -008|static_key_false(inline) > -008|trace_workqueue_execute_end(inline) > -008|process_one_work(worker = 0xFFFFFFC017C8CA00, work = > 0xFFFFFFC00253A848) > -009|worker_thread(__worker = 0xFFFFFFC017C8CA00) > > > > "kworker/u16:0" call stack: > {code} > -000|__switch_to(prev = 0xFFFFFFC075CD5400, next = 0xFFFFFFC024510000) > -001|context_switch(inline) > -001|__schedule() > -002|schedule() > -003|schedule_timeout(timeout = -272798837376) > -004|do_wait_for_common(inline) > -004|__wait_for_common(inline) > -004|wait_for_common(x = 0xFFFFFFC075D3F928, timeout = > 9223372036854775807, ?) > -005|wait_for_completion(?) > -006|flush_work( <---------- wq is > already blocked for the mutex above > | work = 0xFFFFFFC00253A848 -> ( > | data = (counter = 417), > | entry = (next = 0xFFFFFFC00253A850, prev = > 0xFFFFFFC00253A850), > | func = 0xFFFFFFC000659B20 -> )) > | barr = (work = (data = (counter = -272799003163), entry = (next = > 0xFFFFFFC017C8CA30, prev = 0xFFFFFFC017C8CA30), func = > | pool = 0xFFFFFFC0703784C0 > | pwq = 0xFFFFFFC07BE98C00 > | linked = 26644480 > | __key = () > -007|clear_work_data(inline) > -007|__cancel_work_timer(work = 0xFFFFFFC00253A848, is_dwork = FALSE) > -008|cancel_delayed_work_sync(?) > -009|hub_quiesce(hub = 0xFFFFFFC00253A680, ?) > -010|hub_disconnect(intf = 0xFFFFFFC00255EE00) > -011|usb_unbind_interface(dev = 0xFFFFFFC00255EE30) > -012|__device_release_driver(dev = 0xCB88537FDC8CAE50) > -013|device_unlock(inline) > -013|device_release_driver(dev = 0xFFFFFFC00255EE30) > -014|bus_remove_device(dev = 0xFFFFFFC00255EE30) > -015|device_del(dev = 0xFFFFFFC00255EE30) > -016|usb_disable_device(dev = 0xFFFFFFC032C42600, skip_ep0 = 0) > -017|usb_disconnect(pdev = 0xFFFFFFC075D3FCA0) > -018|usb_remove_hcd(hcd = 0xFFFFFFC0700AE880) Yes, it could lead to this deadlock. > Should we rather have device_trylock instead of device_lock No. > > @@ -1031,10 +1031,20 @@ static void hub_activate(struct usb_hub > > unsigned delay; > > > > /* Continue a partial initialization */ > > - if (type == HUB_INIT2) > > - goto init2; > > - if (type == HUB_INIT3) > > + if (type == HUB_INIT2 || type == HUB_INIT3) { > > + device_lock(hub->intfdev); > > device_trylock?? If failed to acquire then treat that as disconnect? No, that's not the right answer. I believe the correct solution is to remove the cancel_delayed_work_sync() call in hub_quiesce(). Viresh, do you agree? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html