Hi Oleg, On 06/08/2018 10:06 PM, Oleg Nesterov wrote: > Hello, > > I am travelling till the end of the next week, can't read this version > until I return. Just one question, > > On 06/06, Ravi Bangoria wrote: >> >> 1. One of the major reason was the deadlock between uprobe_lock and >> mm->mmap inside trace_uprobe_mmap(). That deadlock was not easy to fix > > Could you remind what exactly was wrong? > > I can't find your previous email about this problem, but iirc you didn't > explain the deadlock in details, just copied some traces from lockdep... The deadlock is between mm->mmap_sem and uprobe_lock. Some existing code path is taking these locks in following order: uprobe_lock event_mutex uprobe->register_rwsem dup_mmap_sem mm->mmap_sem I've introduced new function trace_uprobe_mmap() which gets called from mmap_region() / vma_adjust() with mm->mmap_sem already acquired. And it has to take uprobe_lock to loop over all trace_uprobes. i.e. the sequence is: mm->mmap_sem uprobe_lock Why it's difficult to resolve is because trace_uprobe_mmap() does not have control over mm->mmap_sem. Detailed trace from lockdep: [ 499.258006] ====================================================== [ 499.258205] WARNING: possible circular locking dependency detected [ 499.258409] 4.17.0-rc3+ #76 Not tainted [ 499.258528] ------------------------------------------------------ [ 499.258731] perf/6744 is trying to acquire lock: [ 499.258895] 00000000e4895f49 (uprobe_lock){+.+.}, at: trace_uprobe_mmap+0x78/0x130 [ 499.259147] [ 499.259147] but task is already holding lock: [ 499.259349] 000000009ec93a76 (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xe0/0x160 [ 499.259597] [ 499.259597] which lock already depends on the new lock. [ 499.259597] [ 499.259848] [ 499.259848] the existing dependency chain (in reverse order) is: [ 499.260086] [ 499.260086] -> #4 (&mm->mmap_sem){++++}: [ 499.260277] __lock_acquire+0x53c/0x910 [ 499.260442] lock_acquire+0xf4/0x2f0 [ 499.260595] down_write_killable+0x6c/0x150 [ 499.260764] copy_process.isra.34.part.35+0x1594/0x1be0 [ 499.260967] _do_fork+0xf8/0x910 [ 499.261090] ppc_clone+0x8/0xc [ 499.261209] [ 499.261209] -> #3 (&dup_mmap_sem){++++}: [ 499.261378] __lock_acquire+0x53c/0x910 [ 499.261540] lock_acquire+0xf4/0x2f0 [ 499.261669] down_write+0x6c/0x110 [ 499.261793] percpu_down_write+0x48/0x140 [ 499.261954] register_for_each_vma+0x6c/0x2a0 [ 499.262116] uprobe_register+0x230/0x320 [ 499.262277] probe_event_enable+0x1cc/0x540 [ 499.262435] perf_trace_event_init+0x1e0/0x350 [ 499.262587] perf_trace_init+0xb0/0x110 [ 499.262750] perf_tp_event_init+0x38/0x90 [ 499.262910] perf_try_init_event+0x10c/0x150 [ 499.263075] perf_event_alloc+0xbb0/0xf10 [ 499.263235] sys_perf_event_open+0x2a8/0xdd0 [ 499.263396] system_call+0x58/0x6c [ 499.263516] [ 499.263516] -> #2 (&uprobe->register_rwsem){++++}: [ 499.263723] __lock_acquire+0x53c/0x910 [ 499.263884] lock_acquire+0xf4/0x2f0 [ 499.264002] down_write+0x6c/0x110 [ 499.264118] uprobe_register+0x1ec/0x320 [ 499.264283] probe_event_enable+0x1cc/0x540 [ 499.264442] perf_trace_event_init+0x1e0/0x350 [ 499.264603] perf_trace_init+0xb0/0x110 [ 499.264766] perf_tp_event_init+0x38/0x90 [ 499.264930] perf_try_init_event+0x10c/0x150 [ 499.265092] perf_event_alloc+0xbb0/0xf10 [ 499.265261] sys_perf_event_open+0x2a8/0xdd0 [ 499.265424] system_call+0x58/0x6c [ 499.265542] [ 499.265542] -> #1 (event_mutex){+.+.}: [ 499.265738] __lock_acquire+0x53c/0x910 [ 499.265896] lock_acquire+0xf4/0x2f0 [ 499.266019] __mutex_lock+0xa0/0xab0 [ 499.266142] trace_add_event_call+0x44/0x100 [ 499.266310] create_trace_uprobe+0x4a0/0x8b0 [ 499.266474] trace_run_command+0xa4/0xc0 [ 499.266631] trace_parse_run_command+0xe4/0x200 [ 499.266799] probes_write+0x20/0x40 [ 499.266922] __vfs_write+0x6c/0x240 [ 499.267041] vfs_write+0xd0/0x240 [ 499.267166] ksys_write+0x6c/0x110 [ 499.267295] system_call+0x58/0x6c [ 499.267413] [ 499.267413] -> #0 (uprobe_lock){+.+.}: [ 499.267591] validate_chain.isra.34+0xbd0/0x1000 [ 499.267747] __lock_acquire+0x53c/0x910 [ 499.267917] lock_acquire+0xf4/0x2f0 [ 499.268048] __mutex_lock+0xa0/0xab0 [ 499.268170] trace_uprobe_mmap+0x78/0x130 [ 499.268335] uprobe_mmap+0x80/0x3b0 [ 499.268464] mmap_region+0x290/0x660 [ 499.268590] do_mmap+0x40c/0x500 [ 499.268718] vm_mmap_pgoff+0x114/0x160 [ 499.268870] ksys_mmap_pgoff+0xe8/0x2e0 [ 499.269034] sys_mmap+0x84/0xf0 [ 499.269161] system_call+0x58/0x6c [ 499.269279] [ 499.269279] other info that might help us debug this: [ 499.269279] [ 499.269524] Chain exists of: [ 499.269524] uprobe_lock --> &dup_mmap_sem --> &mm->mmap_sem [ 499.269524] [ 499.269856] Possible unsafe locking scenario: [ 499.269856] [ 499.270058] CPU0 CPU1 [ 499.270223] ---- ---- [ 499.270384] lock(&mm->mmap_sem); [ 499.270514] lock(&dup_mmap_sem); [ 499.270711] lock(&mm->mmap_sem); [ 499.270923] lock(uprobe_lock); [ 499.271046] [ 499.271046] *** DEADLOCK *** [ 499.271046] [ 499.271256] 1 lock held by perf/6744: [ 499.271377] #0: 000000009ec93a76 (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0xe0/0x160 [ 499.271628] [ 499.271628] stack backtrace: [ 499.271797] CPU: 25 PID: 6744 Comm: perf Not tainted 4.17.0-rc3+ #76 [ 499.272003] Call Trace: [ 499.272094] [c0000000e32d74a0] [c000000000b00174] dump_stack+0xe8/0x164 (unreliable) [ 499.272349] [c0000000e32d74f0] [c0000000001a905c] print_circular_bug.isra.30+0x354/0x388 [ 499.272590] [c0000000e32d7590] [c0000000001a3050] check_prev_add.constprop.38+0x8f0/0x910 [ 499.272828] [c0000000e32d7690] [c0000000001a3c40] validate_chain.isra.34+0xbd0/0x1000 [ 499.273070] [c0000000e32d7780] [c0000000001a57cc] __lock_acquire+0x53c/0x910 [ 499.273311] [c0000000e32d7860] [c0000000001a65b4] lock_acquire+0xf4/0x2f0 [ 499.273510] [c0000000e32d7930] [c000000000b1d1f0] __mutex_lock+0xa0/0xab0 [ 499.273717] [c0000000e32d7a40] [c0000000002b01b8] trace_uprobe_mmap+0x78/0x130 [ 499.273952] [c0000000e32d7a90] [c0000000002d7070] uprobe_mmap+0x80/0x3b0 [ 499.274153] [c0000000e32d7b20] [c0000000003550a0] mmap_region+0x290/0x660 [ 499.274353] [c0000000e32d7c00] [c00000000035587c] do_mmap+0x40c/0x500 [ 499.274560] [c0000000e32d7c80] [c00000000031ebc4] vm_mmap_pgoff+0x114/0x160 [ 499.274763] [c0000000e32d7d60] [c000000000352818] ksys_mmap_pgoff+0xe8/0x2e0 [ 499.275013] [c0000000e32d7de0] [c000000000016864] sys_mmap+0x84/0xf0 [ 499.275207] [c0000000e32d7e30] [c00000000000b404] system_call+0x58/0x6c ( Reference: https://lkml.org/lkml/2018/5/25/111 ) -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html