On Mon, Dec 07, 2015 at 05:11:50PM +0800, Lu Baolu wrote: > Hi Peter, > > Have you ever tried disabling auto-pm? Did things go smoothly if auto-pm is disabled? > > I always disable usb auto-pm in below way. > > # echo on | tee /sys/bus/usb/devices/*/power/control > # echo on > /sys/bus/pci/devices/<bus_name>/power/control > > Thanks, > Baolu Hi Baolu, The deadlock does not seem to occur with auto-PM disabled, but that is a workaround for the issue. The hang can always be reproduced under this test: - Start a QEMU VM, passing through the USB adapter - This VM boots to a busybox shell with no other services running or udev magic (to reduce interference). - Enable runtime PM for all devices by default (see script below) - From the console, invoke "ip link set eth1 up" (eth0 is a virtio adapter). # somewhere in /init after mounting filesystems echo /sbin/hotplug > /proc/sys/kernel/hotplug echo auto | tee /sys/bus/pci/devices/*/power/control \ /sys/bus/usb/devices/*/power/control >/dev/null #!/bin/sh # /sbin/hotplug path="/sys/$DEVPATH/power/control" [ -e "$path" ] || return newval=auto read status < "$path" if [ "x$status" != "x$newval" ]; then echo "$DEVPATH: $status -> $newval" >/dev/kmsg echo $newval > "$path" fi With "auto", the ip command hangs (a trace can be found on the bottom of this mail). With "on", it does not. If I keep a loop spinning that invokes `ethtool eth1`, the command returns immediately without issues (presumably because the device is not suspended through runtime PM). Under some circumstances I get a lockdep warning (when trying to bring an interface down if I remember correctly). Its trace can be found on the bottom of this mail. I'll keep testing. For the lockdep warning, my initial guess is that calling schedule_delayed_work_sync under tp->lock is a bad idea because scheduled work can execute and try to claim tp->lock too. Maybe there are two different lockup cases here, I'll keep testing. Kind regards, Peter > On 12/05/2015 06:59 PM, Peter Wu wrote: > > Hi, > > > > I rarely use a Realtek USB 3.0 Gigabit Ethernet adapter (vid/pid > > 0bda:8153), but when I did last night, it resulted in a lockup of > > processes doing networking ("ip link", "ping", "ethtool", ...). > > > > A (few) minute(s) before that event, I noticed that there was no network > > connectivity (ping hung) which was somehow solved by invoking "ethtool > > eth1" (triggering runtime pm wakeup?). This same trick did not work at > > the next event. Invoking "ethtool eth1", "ip link", etc. hung completely > > and interrupt (^C) did not work at all. > > > > Since that did not work, I pulled the USB adapter and re-inserted it, > > hoping it would reset things. That did not work at all, there was a > > "usb disconnect" message, but no further driver messages. > > > > Fast forward an hour, and it has become a disaster. I have terminated > > and killed many programs via SysRq but am still unable to get a stable > > system that does not hang on network I/O. Even the suspend process > > fails so in the end I attempted to shutdown the system. After half an > > hour after getting the poweroff message, I issued SysRq + B to reboot > > (since SysRq + O did not shut down either). > > > > Attached are logs with various various backtraces from SysRq and failed > > suspend. Let me know if you need more information! > > > > By the way, often I have to rmmod xhci and re-insert it, otherwise > > plugging it in does not result in a detection. A USB 2.0 port does not > > have this problem (runtime PM is enabled for all devices). This is the > > USB 3.0 port: > > > > 02:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 > > Host Controller [1033:0194] (rev 03) > -- lockdep splat from the bare machine: ====================================================== [ INFO: possible circular locking dependency detected ] 4.3.0-custom #1 Tainted: G O ------------------------------------------------------- kworker/0:1/38 is trying to acquire lock: (&tp->control){+.+.+.}, at: [<ffffffffa05d3544>] rtl8152_resume+0x24/0x130 [r8152] but task is already holding lock: ((&(&tp->schedule)->work)){+.+.+.}, at: [<ffffffff8107819c>] process_one_work+0x15c/0x660 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((&(&tp->schedule)->work)){+.+.+.}: [<ffffffff810b17d3>] lock_acquire+0xc3/0x1d0 [<ffffffff8107751d>] flush_work+0x3d/0x290 [<ffffffff810791ae>] __cancel_work_timer+0xfe/0x1c0 [<ffffffff810792a3>] cancel_delayed_work_sync+0x13/0x20 [<ffffffffa05d0dea>] rtl8152_set_speed+0x2a/0x260 [r8152] [<ffffffffa05d4a26>] rtl8152_open+0x396/0x4f0 [r8152] [<ffffffff81567ebf>] __dev_open+0xaf/0x120 [<ffffffff815681ad>] __dev_change_flags+0x9d/0x160 [<ffffffff81568299>] dev_change_flags+0x29/0x70 [<ffffffff8157785a>] do_setlink+0x5ba/0xb00 [<ffffffff81578469>] rtnl_newlink+0x5a9/0x8a0 [<ffffffff81576bb4>] rtnetlink_rcv_msg+0x84/0x210 [<ffffffff81599ab7>] netlink_rcv_skb+0x97/0xb0 [<ffffffff81576b1a>] rtnetlink_rcv+0x2a/0x40 [<ffffffff8159943e>] netlink_unicast+0x1be/0x2d0 [<ffffffff8159983b>] netlink_sendmsg+0x2eb/0x3a0 [<ffffffff81545428>] sock_sendmsg+0x38/0x50 [<ffffffff81545d42>] ___sys_sendmsg+0x292/0x2a0 [<ffffffff81546cc2>] __sys_sendmsg+0x42/0x80 [<ffffffff81546d12>] SyS_sendmsg+0x12/0x20 [<ffffffff816707d7>] entry_SYSCALL_64_fastpath+0x12/0x6f -> #0 (&tp->control){+.+.+.}: [<ffffffff810b0ed7>] __lock_acquire+0x1a47/0x1a50 [<ffffffff810b17d3>] lock_acquire+0xc3/0x1d0 [<ffffffff8166c0a7>] mutex_lock_nested+0x67/0x3c0 [<ffffffffa05d3544>] rtl8152_resume+0x24/0x130 [r8152] [<ffffffff814d735f>] usb_resume_interface.isra.6+0x9f/0x100 [<ffffffff814d75fa>] usb_resume_both+0x6a/0x130 [<ffffffff814d869a>] usb_runtime_resume+0x1a/0x20 [<ffffffff814602c2>] __rpm_callback+0x32/0x70 [<ffffffff8146035f>] rpm_callback+0x5f/0x80 [<ffffffff81460c4b>] rpm_resume+0x4db/0x7d0 [<ffffffff81460b0c>] rpm_resume+0x39c/0x7d0 [<ffffffff8146211e>] __pm_runtime_resume+0x4e/0x80 [<ffffffff814d7112>] usb_autopm_get_interface+0x22/0x60 [<ffffffffa05d4cf0>] rtl_work_func_t+0x40/0x3ee [r8152] [<ffffffff8107822d>] process_one_work+0x1ed/0x660 [<ffffffff810786ee>] worker_thread+0x4e/0x450 [<ffffffff8107f13f>] kthread+0xef/0x110 [<ffffffff81670b4f>] ret_from_fork+0x3f/0x70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((&(&tp->schedule)->work)); lock(&tp->control); lock((&(&tp->schedule)->work)); lock(&tp->control); *** DEADLOCK *** 2 locks held by kworker/0:1/38: #0: ("events"){.+.+.+}, at: [<ffffffff8107819c>] process_one_work+0x15c/0x660 #1: ((&(&tp->schedule)->work)){+.+.+.}, at: [<ffffffff8107819c>] process_one_work+0x15c/0x660 stack backtrace: CPU: 0 PID: 38 Comm: kworker/0:1 Tainted: G O 4.3.0-custom #1 Hardware name: CLEVO CO. B7130 /B7130 , BIOS 6.00 08/27/2010 Workqueue: events rtl_work_func_t [r8152] ffffffff8264d7d0 ffff8802321ff950 ffffffff8134cdac ffffffff8264d7d0 ffff8802321ff990 ffffffff810adbb3 ffff8802321ff9e0 ffff8802320e50b8 0000000000000002 0000000000000001 ffff8802320e50e0 ffff8802320e4880 Call Trace: [<ffffffff8134cdac>] dump_stack+0x4e/0x82 [<ffffffff810adbb3>] print_circular_bug+0x1e3/0x250 [<ffffffff810b0ed7>] __lock_acquire+0x1a47/0x1a50 [<ffffffff810c505d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 [<ffffffff810b17d3>] lock_acquire+0xc3/0x1d0 [<ffffffffa05d3544>] ? rtl8152_resume+0x24/0x130 [r8152] [<ffffffffa05d3544>] ? rtl8152_resume+0x24/0x130 [r8152] [<ffffffff8166c0a7>] mutex_lock_nested+0x67/0x3c0 [<ffffffffa05d3544>] ? rtl8152_resume+0x24/0x130 [r8152] [<ffffffff8166d7be>] ? mutex_unlock+0xe/0x10 [<ffffffff814d8680>] ? usb_runtime_suspend+0x70/0x70 [<ffffffffa05d3544>] rtl8152_resume+0x24/0x130 [r8152] [<ffffffff814d735f>] usb_resume_interface.isra.6+0x9f/0x100 [<ffffffff814d75fa>] usb_resume_both+0x6a/0x130 [<ffffffff814d869a>] usb_runtime_resume+0x1a/0x20 [<ffffffff814602c2>] __rpm_callback+0x32/0x70 [<ffffffff8146035f>] rpm_callback+0x5f/0x80 [<ffffffff814d8680>] ? usb_runtime_suspend+0x70/0x70 [<ffffffff81460c4b>] rpm_resume+0x4db/0x7d0 [<ffffffff81460b0c>] rpm_resume+0x39c/0x7d0 [<ffffffff8146211e>] __pm_runtime_resume+0x4e/0x80 [<ffffffff814d7112>] usb_autopm_get_interface+0x22/0x60 [<ffffffffa05d4cf0>] rtl_work_func_t+0x40/0x3ee [r8152] [<ffffffff8107822d>] process_one_work+0x1ed/0x660 [<ffffffff8107819c>] ? process_one_work+0x15c/0x660 [<ffffffff810786ee>] worker_thread+0x4e/0x450 [<ffffffff810786a0>] ? process_one_work+0x660/0x660 [<ffffffff8107f13f>] kthread+0xef/0x110 [<ffffffff8107f050>] ? kthread_create_on_node+0x200/0x200 [<ffffffff81670b4f>] ret_from_fork+0x3f/0x70 [<ffffffff8107f050>] ? kthread_create_on_node+0x200/0x200 Hung task message in the VM after "ip link set eth1 up" INFO: task kworker/1:1:27 blocked for more than 60 seconds. Not tainted 4.4.0-rc3-rtlwifi+ #9 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/1:1 D ffff88000de8bca0 0 27 2 0x00000000 Workqueue: events linkwatch_event ffff88000de8bca0 ffff88000de78808 000000000de8bc98 ffff88000dcd0000 ffff88000de78000 ffff88000de8c000 0000000000000246 ffffffff81ceb428 ffff88000de78000 00000000ffffffff ffff88000de8bcb8 ffffffff8173a23c Call Trace: [<ffffffff8173a23c>] schedule+0x3c/0x90 [<ffffffff8173a2a5>] schedule_preempt_disabled+0x15/0x20 [<ffffffff8173b9dc>] mutex_lock_nested+0x16c/0x390 [<ffffffff815c9cd7>] ? rtnl_lock+0x17/0x20 [<ffffffff815c9cd7>] ? rtnl_lock+0x17/0x20 [<ffffffff815c9cd7>] rtnl_lock+0x17/0x20 [<ffffffff815cd80e>] linkwatch_event+0xe/0x30 [<ffffffff810c22f7>] process_one_work+0x1f7/0x640 [<ffffffff810c226c>] ? process_one_work+0x16c/0x640 [<ffffffff810c278e>] worker_thread+0x4e/0x450 [<ffffffff810c2740>] ? process_one_work+0x640/0x640 [<ffffffff810c2740>] ? process_one_work+0x640/0x640 [<ffffffff810c97c2>] kthread+0xf2/0x110 [<ffffffff810c96d0>] ? __kthread_parkme+0x90/0x90 [<ffffffff81740bbf>] ret_from_fork+0x3f/0x70 [<ffffffff810c96d0>] ? __kthread_parkme+0x90/0x90 3 locks held by kworker/1:1/27: #0: ("events"){.+.+.+}, at: [<ffffffff810c226c>] process_one_work+0x16c/0x640 #1: ((linkwatch_work).work){+.+...}, at: [<ffffffff810c226c>] process_one_work+0x16c/0x640 #2: (rtnl_mutex){+.+.+.}, at: [<ffffffff815c9cd7>] rtnl_lock+0x17/0x20 -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html