On Tue, Jun 4, 2019 at 12:50 PM Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > > Hi, > > On 04-06-19 12:08, Benjamin Tissoires wrote: > > On Tue, Jun 4, 2019 at 9:51 AM Benjamin Tissoires > > <benjamin.tissoires@xxxxxxxxxx> wrote: > >> > >> On Mon, Jun 3, 2019 at 4:17 PM Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > >>> > >>> Hi, > >>> > >>> On 03-06-19 15:55, Benjamin Tissoires wrote: > >>>> On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > >>>>> > >>>>> Hi Again, > >>>>> > >>>>> On 03-06-19 11:11, Hans de Goede wrote: > >>>>> <snip> > >>>>> > >>>>>>> not sure about the rest of logitech issues yet) next week. > >>>>>> > >>>>>> The main problem seems to be the request_module patches. Although I also > >>>> > >>>> Can't we use request_module_nowait() instead, and set a reasonable > >>>> timeout that we detect only once to check if userspace is compatible: > >>>> > >>>> In pseudo-code: > >>>> if (!request_module_checked) { > >>>> request_module_nowait(name); > >>>> use_request_module = wait_event_timeout(wq, > >>>> first_module_loaded, 10 seconds in jiffies); > >>>> request_module_checked = true; > >>>> } else if (use_request_module) { > >>>> request_module(name); > >>>> } > >>> > >>> Well looking at the just attached dmesg , the modprobe > >>> when triggered by udev from userspace succeeds in about > >>> 0.5 seconds, so it seems that the modprobe hangs happens > >>> when called from within the kernel rather then from within > >>> userspace. > >>> > >>> What I do not know if is the hang is inside userspace, or > >>> maybe it happens when modprobe calls back into the kernel, > >>> if the hang happens when modprobe calls back into the kernel, > >>> then other modprobes (done from udev) likely will hang too > >>> since I think only 1 modprobe can happen at a time. > >>> > >>> I really wish we knew what distinguished working systems > >>> from non working systems :| > >>> > >>> I cannot find a common denominator; other then the systems > >>> are not running Fedora. So far we've reports from both Ubuntu 16.04 > >>> and Tumbleweed, so software version wise these 2 are wide apart. > >> > >> I am trying to reproduce the lock locally, and installed an opensuse > >> Tumbleweed in a VM. When forwarding a Unifying receiver to the VM, I > >> do not see the lock with either my vanilla compiled kernel and the rpm > >> found in http://download.opensuse.org/repositories/Kernel:/HEAD/standard/x86_64/ > >> > >> Next step is install Tumbleweed on bare metal, but I do not see how > >> this could introduce a difference (maybe USB2 vs 3). > > > > Making progress here. > > > > The difference between Ubuntu/Tumbleweed and Fedora: usbhid is shipped > > as a module while in Fedora usbhid is included in the kernel. > > > > If I rmmod hid_* and usbhid, then modprobe usbhid, the command hangs > > for 3 minutes. > > If usbhid is already loaded, inserting a receiver is immediate > > regarding the loading of the external modules. > > > > So my assumption is that when the device gets detected at boot, usbhid > > gets loaded by the kernel event, which in turns attempts to call > > __request_module, but the modprobe can't be fulfilled because it's > > already waiting for the initial usbhid modprobe to finish. > > > > Still don't know how to solve that, but I thought I should share. > > Hmm, we may be hitting the scenario described in the big comment > around line 3500 of kernel/module.c. Well, we are not locking during do_init_module(), but in waiting for the completion of request_module(). So as I read the trace, we wait for userspace to call/terminate modprobe. > > But I'm not sure that is what is happening here. > > Maybe you can put a WARN_ON(1) in request_module and look at the > backtrace ? That may help to figure out what is going on; or > alternatively it might help to find some way to detect this and > if it happens skip the request_module... Ftrace is much easier to deal with: --- /sys/kernel/debug/tracing # cat trace # tracer: function # # entries-in-buffer/entries-written: 4/4 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | modprobe-23236 [003] .... 9191.880917: __request_module <-hid_add_device modprobe-23236 [003] .... 9191.880937: <stack trace> => 0xffffffffc0851061 => __request_module => hid_add_device => usbhid_probe => usb_probe_interface => really_probe => driver_probe_device => device_driver_attach => __driver_attach => bus_for_each_dev => bus_add_driver => driver_register => usb_register_driver => hid_init => do_one_initcall => do_init_module => load_module => __do_sys_finit_module => do_syscall_64 => entry_SYSCALL_64_after_hwframe modprobe-23236 [003] .... 9191.892765: __request_module <-hid_add_device modprobe-23236 [003] .... 9191.892784: <stack trace> => 0xffffffffc0851061 => __request_module => hid_add_device => usbhid_probe => usb_probe_interface => really_probe => driver_probe_device => device_driver_attach => __driver_attach => bus_for_each_dev => bus_add_driver => driver_register => usb_register_driver => hid_init => do_one_initcall => do_init_module => load_module => __do_sys_finit_module => do_syscall_64 => entry_SYSCALL_64_after_hwframe --- This is what happen with the logitech receiver plugged in and while modprobing usbhid. The modprobe command hangs, and I took the trace here. If I hit Ctrl-C, the modprobe loading continue properly. There is nothing special here in the stack trace, except that there are 2 calls to request_module here (one for the touchscreen I guess and one for the logitech receiver). I have attached the call graph of the same sequence: wait_for_completion_killable() seems to be the culprit, we are waiting for userspace to notify it has done calling modprobe. My idea would be to defer any call to hid_add_device() into a workqueue and see if that unlocks the situation. Cheers, Benjamin
/sys/kernel/debug/tracing # cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) | __request_module() { 0) 0.256 us | current_is_async(); 0) | security_kernel_module_request() { 0) 0.307 us | integrity_kernel_module_request(); 0) 0.625 us | } 0) | kmem_cache_alloc_trace() { 0) | _cond_resched() { 0) 0.123 us | rcu_all_qs(); 0) 0.355 us | } 0) 0.122 us | should_failslab(); 0) | _cond_resched() { 0) 0.120 us | rcu_all_qs(); 0) 0.355 us | } 0) 0.120 us | memcg_kmem_put_cache(); 0) 1.563 us | } 0) | kstrdup() { 0) | __kmalloc_track_caller() { 0) 0.115 us | kmalloc_slab(); 0) | _cond_resched() { 0) 0.118 us | rcu_all_qs(); 0) 0.348 us | } 0) 0.119 us | should_failslab(); 0) | _cond_resched() { 0) 0.115 us | rcu_all_qs(); 0) 0.345 us | } 0) 0.119 us | memcg_kmem_put_cache(); 0) 1.757 us | } 0) 2.040 us | } 0) | call_usermodehelper_setup() { 0) | kmem_cache_alloc_trace() { 0) | _cond_resched() { 0) 0.116 us | rcu_all_qs(); 0) 0.345 us | } 0) 0.117 us | should_failslab(); 0) | _cond_resched() { 0) 0.118 us | rcu_all_qs(); 0) 0.347 us | } 0) 0.123 us | memcg_kmem_put_cache(); 0) 1.792 us | } 0) 2.023 us | } 0) | call_usermodehelper_exec() { 0) | queue_work_on() { 0) | __queue_work() { 0) 0.126 us | get_work_pool(); 0) 0.183 us | _raw_spin_lock(); 0) | insert_work() { 0) 0.121 us | get_pwq.isra.0(); 0) | wake_up_process() { 0) | try_to_wake_up() { 0) 0.227 us | _raw_spin_lock_irqsave(); 0) | select_task_rq_fair() { 0) 0.133 us | available_idle_cpu(); 0) 0.166 us | update_cfs_rq_h_load(); 0) | select_idle_sibling() { 0) 0.137 us | available_idle_cpu(); 0) 0.371 us | } 0) 1.299 us | } 0) 0.133 us | _raw_spin_lock(); 0) 0.148 us | update_rq_clock(); 0) | ttwu_do_activate() { 0) | activate_task() { 0) | enqueue_task_fair() { 0) | enqueue_entity() { 0) 0.123 us | update_curr(); 0) | __update_load_avg_se() { 0) 0.118 us | __accumulate_pelt_segments(); 0) 0.420 us | } 0) 0.125 us | __update_load_avg_cfs_rq(); 0) 0.122 us | update_cfs_group(); 0) 0.138 us | account_entity_enqueue(); 0) 0.120 us | place_entity(); 0) 0.119 us | __enqueue_entity(); 0) 2.146 us | } 0) 0.115 us | hrtick_update(); 0) 2.624 us | } 0) 2.890 us | } 0) | ttwu_do_wakeup() { 0) | check_preempt_curr() { 0) 0.130 us | resched_curr(); 0) 0.417 us | } 0) 1.034 us | } 0) 4.269 us | } 0) 0.410 us | _raw_spin_unlock_irqrestore(); 0) 7.466 us | } 0) 7.702 us | } 0) 8.164 us | } 0) 9.079 us | } 0) 9.370 us | } 0) | wait_for_completion_killable() { 0) | _cond_resched() { 0) 0.118 us | rcu_all_qs(); 0) 0.365 us | } 0) 0.122 us | _raw_spin_lock_irq(); 0) | schedule_timeout() { 0) | schedule() { 0) | rcu_note_context_switch() { 0) 0.147 us | rcu_qs(); 0) 0.384 us | } 0) 0.124 us | _raw_spin_lock(); 0) 0.144 us | update_rq_clock(); 0) | deactivate_task() { 0) | dequeue_task_fair() { 0) | dequeue_entity() { 0) | update_curr() { 0) 0.121 us | update_min_vruntime(); 0) 0.132 us | cpuacct_charge(); 0) | __cgroup_account_cputime() { 0) 0.124 us | cgroup_rstat_updated(); 0) 0.354 us | } 0) 1.088 us | } 0) 0.127 us | __update_load_avg_se(); 0) 0.136 us | __update_load_avg_cfs_rq(); 0) 0.118 us | clear_buddies(); 0) 0.124 us | account_entity_dequeue(); 0) 0.120 us | update_cfs_group(); 0) 0.128 us | update_min_vruntime(); 0) 2.780 us | } 0) 0.114 us | hrtick_update(); 0) 3.242 us | } 0) 3.503 us | } 0) | pick_next_task_fair() { 0) 0.121 us | __msecs_to_jiffies(); 0) 0.389 us | } 0) | pick_next_task_idle() { 0) | put_prev_task_fair() { 0) | put_prev_entity() { 0) 0.122 us | check_cfs_rq_runtime(); 0) 0.355 us | } 0) 0.581 us | } 0) 0.123 us | __update_idle_core(); 0) 1.055 us | } 0) 0.118 us | enter_lazy_tlb(); 2) | finish_task_switch() { 2) | __mmdrop() { 2) | pgd_free() { 2) 0.248 us | _raw_spin_lock(); 2) | free_pages() { 2) | free_pages.part.0() { 2) | __free_pages() { 2) | __free_pages_ok() { 2) | free_one_page() { 2) 0.213 us | _raw_spin_lock(); 2) 0.219 us | __mod_zone_page_state(); 2) 1.102 us | } 2) 1.839 us | } 2) 2.337 us | } 2) 2.672 us | } 2) 2.986 us | } 2) 4.044 us | } 2) 0.218 us | destroy_context_ldt(); 2) | kmem_cache_free() { 2) 0.416 us | ___cache_free(); 2) 1.051 us | } 2) 6.234 us | } 2) 8.542 us | } 2) * 13308.49 us | } /* schedule */ 2) * 13308.85 us | } /* schedule_timeout */ 2) 0.235 us | _raw_spin_lock_irq(); 2) * 13310.47 us | } /* wait_for_completion_killable */ 2) | free_modprobe_argv() { 2) | kfree() { 2) 0.213 us | ___cache_free(); 2) 0.657 us | } 2) | kfree() { 2) 0.153 us | ___cache_free(); 2) 0.557 us | } 2) 1.732 us | } 2) | kfree() { 2) 0.207 us | ___cache_free(); 2) 0.734 us | } 2) | __wake_up() { 2) | __wake_up_common_lock() { 2) 0.265 us | _raw_spin_lock_irqsave(); 2) 0.199 us | __wake_up_common(); 2) 0.214 us | _raw_spin_unlock_irqrestore(); 2) 1.333 us | } 2) 1.648 us | } 2) * 13325.51 us | } /* call_usermodehelper_exec */ 2) | __wake_up() { 2) | __wake_up_common_lock() { 2) 0.195 us | _raw_spin_lock_irqsave(); 2) 0.200 us | __wake_up_common(); 2) 0.189 us | _raw_spin_unlock_irqrestore(); 2) 1.274 us | } 2) 1.629 us | } 2) * 13337.37 us | } /* __request_module */ 2) | __request_module() { 2) 0.279 us | current_is_async(); 2) | security_kernel_module_request() { 2) 0.310 us | integrity_kernel_module_request(); 2) 0.799 us | } 2) | kmem_cache_alloc_trace() { 2) | _cond_resched() { 2) 0.209 us | rcu_all_qs(); 2) 0.598 us | } 2) 0.208 us | should_failslab(); 2) | _cond_resched() { 2) 0.202 us | rcu_all_qs(); 2) 0.597 us | } 2) 0.206 us | memcg_kmem_put_cache(); 2) 2.596 us | } 2) | kstrdup() { 2) | __kmalloc_track_caller() { 2) 0.209 us | kmalloc_slab(); 2) | _cond_resched() { 2) 0.200 us | rcu_all_qs(); 2) 0.598 us | } 2) 0.399 us | should_failslab(); 2) | _cond_resched() { 2) 0.179 us | rcu_all_qs(); 2) 0.491 us | } 2) 0.174 us | memcg_kmem_put_cache(); 2) 3.004 us | } 2) 3.424 us | } 2) | call_usermodehelper_setup() { 2) | kmem_cache_alloc_trace() { 2) | _cond_resched() { 2) 0.171 us | rcu_all_qs(); 2) 0.816 us | } 2) 0.173 us | should_failslab(); 2) | _cond_resched() { 2) 0.188 us | rcu_all_qs(); 2) 0.544 us | } 2) 0.205 us | memcg_kmem_put_cache(); 2) 2.728 us | } 2) 3.100 us | } 2) | call_usermodehelper_exec() { 2) | queue_work_on() { 2) | __queue_work() { 2) 0.213 us | get_work_pool(); 2) 0.206 us | _raw_spin_lock(); 2) | insert_work() { 2) 0.194 us | get_pwq.isra.0(); 2) | wake_up_process() { 2) | try_to_wake_up() { 2) 0.248 us | _raw_spin_lock_irqsave(); 2) | select_task_rq_fair() { 2) 0.205 us | available_idle_cpu(); 2) 0.252 us | update_cfs_rq_h_load(); 2) | select_idle_sibling() { 2) 0.277 us | available_idle_cpu(); 2) 0.666 us | } 2) 2.074 us | } 2) 0.213 us | _raw_spin_lock(); 2) 0.240 us | update_rq_clock(); 2) | ttwu_do_activate() { 2) | activate_task() { 2) | enqueue_task_fair() { 2) | enqueue_entity() { 2) 0.209 us | update_curr(); 2) | __update_load_avg_se() { 2) 0.198 us | __accumulate_pelt_segments(); 2) 0.715 us | } 2) 0.221 us | __update_load_avg_cfs_rq(); 2) 0.205 us | update_cfs_group(); 2) 0.274 us | account_entity_enqueue(); 2) 0.209 us | place_entity(); 2) 0.211 us | __enqueue_entity(); 2) 3.718 us | } 2) 0.208 us | hrtick_update(); 2) 4.568 us | } 2) 5.005 us | } 2) | ttwu_do_wakeup() { 2) | check_preempt_curr() { 2) 0.222 us | resched_curr(); 2) 0.700 us | } 2) 1.473 us | } 2) 7.098 us | } 2) 0.218 us | _raw_spin_unlock_irqrestore(); 2) + 11.605 us | } 2) + 11.947 us | } 2) + 12.730 us | } 2) + 14.091 us | } 2) + 14.527 us | } 2) | wait_for_completion_killable() { 2) | _cond_resched() { 2) 0.210 us | rcu_all_qs(); 2) 0.636 us | } 2) 0.241 us | _raw_spin_lock_irq(); 2) | schedule_timeout() { 2) | schedule() { 2) | rcu_note_context_switch() { 2) 0.217 us | rcu_qs(); 2) 0.618 us | } 2) 0.200 us | _raw_spin_lock(); 2) 0.213 us | update_rq_clock(); 2) | deactivate_task() { 2) | dequeue_task_fair() { 2) | dequeue_entity() { 2) | update_curr() { 2) 0.196 us | update_min_vruntime(); 2) 0.239 us | cpuacct_charge(); 2) | __cgroup_account_cputime() { 2) 0.202 us | cgroup_rstat_updated(); 2) 0.638 us | } 2) 1.946 us | } 2) 0.210 us | __update_load_avg_se(); 2) 0.213 us | __update_load_avg_cfs_rq(); 2) 0.200 us | clear_buddies(); 2) 0.212 us | account_entity_dequeue(); 2) 0.203 us | update_cfs_group(); 2) 0.207 us | update_min_vruntime(); 2) 5.029 us | } 2) 0.194 us | hrtick_update(); 2) 5.793 us | } 2) 6.205 us | } 2) | pick_next_task_fair() { 2) | update_blocked_averages() { 2) 0.149 us | _raw_spin_lock_irqsave(); 2) 0.205 us | update_rq_clock(); 2) 0.167 us | __update_load_avg_cfs_rq(); 2) 0.206 us | update_rt_rq_load_avg(); 2) 0.184 us | update_dl_rq_load_avg(); 2) 0.183 us | _raw_spin_unlock_irqrestore(); 2) 2.510 us | } 2) | load_balance() { 2) | find_busiest_group() { 2) 0.172 us | update_nohz_stats(); 2) 0.176 us | idle_cpu(); 2) 0.207 us | update_nohz_stats(); 2) 1.818 us | } 2) 2.339 us | } 2) 0.171 us | __msecs_to_jiffies(); 2) | load_balance() { 2) | find_busiest_group() { 2) 0.211 us | update_nohz_stats(); 2) 0.203 us | update_nohz_stats(); 2) 0.190 us | idle_cpu(); 2) 0.216 us | update_nohz_stats(); 2) 0.222 us | update_nohz_stats(); 2) 2.711 us | } 2) 0.228 us | _raw_spin_lock_irqsave(); 2) 0.262 us | update_rq_clock(); 2) 0.233 us | can_migrate_task(); 2) 0.322 us | can_migrate_task(); 2) 0.222 us | update_cfs_rq_h_load(); 2) | deactivate_task() { 2) | dequeue_task_fair() { 2) | dequeue_entity() { 2) | update_curr() { 2) 0.209 us | update_min_vruntime(); 2) 0.259 us | cpuacct_charge(); 2) | __cgroup_account_cputime() { 2) 0.193 us | cgroup_rstat_updated(); 2) 0.738 us | } 2) 2.022 us | } 2) 0.223 us | __update_load_avg_se(); 2) 0.213 us | __update_load_avg_cfs_rq(); 2) 0.202 us | clear_buddies(); 2) 0.215 us | account_entity_dequeue(); 2) 0.204 us | update_cfs_group(); 2) 0.202 us | update_min_vruntime(); 2) 4.891 us | } 2) 0.205 us | hrtick_update(); 2) 5.711 us | } 2) 6.143 us | } 2) | set_task_cpu() { 2) | migrate_task_rq_fair() { 2) | detach_entity_cfs_rq() { 2) 0.215 us | __update_load_avg_se(); 2) 0.204 us | __update_load_avg_cfs_rq(); 2) 0.219 us | propagate_entity_cfs_rq.isra.0(); 2) 1.433 us | } 2) 1.814 us | } 2) 0.195 us | set_task_rq_fair(); 2) 2.664 us | } 2) 0.202 us | _raw_spin_lock(); 2) 0.219 us | update_rq_clock(); 2) | attach_task() { 2) | activate_task() { 2) | enqueue_task_fair() { 2) | enqueue_entity() { 2) | update_curr() { 2) 0.225 us | update_min_vruntime(); 2) 0.202 us | cpuacct_charge(); 2) | __cgroup_account_cputime() { 2) 0.196 us | cgroup_rstat_updated(); 2) 0.569 us | } 2) 2.120 us | } 2) 0.208 us | __update_load_avg_cfs_rq(); 2) 0.219 us | attach_entity_load_avg(); 2) 0.203 us | update_cfs_group(); 2) 0.184 us | account_entity_enqueue(); 2) 0.217 us | __enqueue_entity(); 2) 4.678 us | } 2) 0.177 us | hrtick_update(); 2) 5.408 us | } 2) 5.795 us | } 2) | check_preempt_curr() { 2) | check_preempt_wakeup() { 2) 0.205 us | update_curr(); 2) 0.201 us | wakeup_preempt_entity.isra.0(); 2) 0.970 us | } 2) 1.359 us | } 2) 7.688 us | } 2) + 23.511 us | } 2) 0.201 us | __msecs_to_jiffies(); 2) 0.205 us | _raw_spin_lock(); 2) 0.197 us | check_cfs_rq_runtime(); 2) | pick_next_entity() { 2) 0.198 us | clear_buddies(); 2) 0.600 us | } 2) | put_prev_entity() { 2) 0.188 us | check_cfs_rq_runtime(); 2) 0.583 us | } 2) | set_next_entity() { 2) 0.209 us | __update_load_avg_se(); 2) 0.215 us | __update_load_avg_cfs_rq(); 2) 1.053 us | } 2) + 33.704 us | } 2) | switch_mm_irqs_off() { 2) 0.607 us | load_new_mm_cr3(); 2) 2.798 us | } 3) 5.150 us | finish_task_switch(); -------------------------------------------------------------------------------- Ctrl-C hit, notice the delay -------------------------------------------------------------------------------- 3) $ 54568382 us | } /* schedule */ 3) $ 54568383 us | } /* schedule_timeout */ 3) 1.011 us | _raw_spin_lock_irq(); 3) $ 54568388 us | } /* wait_for_completion_killable */ 3) | __wake_up() { 3) | __wake_up_common_lock() { 3) 0.911 us | _raw_spin_lock_irqsave(); 3) 1.015 us | __wake_up_common(); 3) 0.958 us | _raw_spin_unlock_irqrestore(); 3) 6.186 us | } 3) 7.650 us | } 3) $ 54568414 us | } /* call_usermodehelper_exec */ 3) | __wake_up() { 3) | __wake_up_common_lock() { 3) 0.720 us | _raw_spin_lock_irqsave(); 3) 0.826 us | __wake_up_common(); 3) 0.850 us | _raw_spin_unlock_irqrestore(); 3) 5.655 us | } 3) 7.309 us | } 3) $ 54568437 us | } /* __request_module */