----- Original Message ----- > From: "Jan Kara" <jack@xxxxxxx> > Sent: Friday, October 7, 2016 3:08:38 AM > Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked) > > > So I believe this may be just a problem in overlayfs lockdep annotation > (see below). Added Miklos to CC. > > > Wait. There is also a lockep happened before the xfs internal error as > > well. > > > > [ 5839.452325] ====================================================== > > [ 5839.459221] [ INFO: possible circular locking dependency detected ] > > [ 5839.466215] 4.8.0-rc8-splice-fixw-proc+ #4 Not tainted > > [ 5839.471945] ------------------------------------------------------- > > [ 5839.478937] trinity-c220/69531 is trying to acquire lock: > > [ 5839.484961] (&p->lock){+.+.+.}, at: [<ffffffff812ac69c>] > > seq_read+0x4c/0x3e0 > > [ 5839.492967] > > but task is already holding lock: > > [ 5839.499476] (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>] > > __sb_start_write+0xd1/0xf0 > > [ 5839.508560] > > which lock already depends on the new lock. > > > > [ 5839.517686] > > the existing dependency chain (in reverse order) is: > > [ 5839.526036] > > -> #3 (sb_writers#8){.+.+.+}: > > [ 5839.530751] [<ffffffff810ff174>] lock_acquire+0xd4/0x240 > > [ 5839.537368] [<ffffffff810f8f4a>] percpu_down_read+0x4a/0x90 > > [ 5839.544275] [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0 > > [ 5839.551181] [<ffffffff812a8544>] mnt_want_write+0x24/0x50 > > [ 5839.557892] [<ffffffffa04a398f>] ovl_want_write+0x1f/0x30 > > [overlay] > > [ 5839.565577] [<ffffffffa04a6036>] ovl_do_remove+0x46/0x480 > > [overlay] > > [ 5839.573259] [<ffffffffa04a64a3>] ovl_unlink+0x13/0x20 [overlay] > > [ 5839.580555] [<ffffffff812918ea>] vfs_unlink+0xda/0x190 > > [ 5839.586979] [<ffffffff81293698>] do_unlinkat+0x268/0x2b0 > > [ 5839.593599] [<ffffffff8129419b>] SyS_unlinkat+0x1b/0x30 > > [ 5839.600120] [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0 > > [ 5839.606836] [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a > > [ 5839.614231] > > So here is IMO the real culprit: do_unlinkat() grabs fs freeze protection > through mnt_want_write(), we grab also i_rwsem in do_unlinkat() in > I_MUTEX_PARENT class a bit after that and further down in vfs_unlink() we > grab i_rwsem for the unlinked inode itself in default I_MUTEX class. Then > in ovl_want_write() we grab freeze protection again, but this time for the > upper filesystem. That establishes sb_writers (overlay) -> I_MUTEX_PARENT > (overlay) -> I_MUTEX (overlay) -> sb_writers (FS-A) lock ordering > (we maintain locking classes per fs type so that's why I'm showing fs type > in parenthesis). > > Now this nesting is nasty because once you add locks that are not tracked > per fs type into the mix, you get cycles. In this case we've got > seq_file->lock and cred_guard_mutex into the mix - the splice path is > doing sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex (splicing > from seq_file into the real filesystem). Exec path further establishes > cred_guard_mutex -> I_MUTEX (overlay) which closes the full cycle: > > sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex -> i_mutex > (overlay) -> sb_writers (FS-A) > > If I analyzed the lockdep trace, this looks like a real (although remote) > deadlock possibility. Miklos? So this can still be reproduced in the yesterday's mainline. [40581.813575] [ INFO: possible circular locking dependency detected ] [40581.813578] 4.9.0-rc1-lockfix-uncorev2+ #51 Tainted: G W [40581.813581] ------------------------------------------------------- [40581.813582] trinity-c104/39795 is trying to acquire lock: [40581.813587] ( [40581.813588] &p->lock [40581.813589] ){+.+.+.} [40581.813600] , at: [40581.813601] [<ffffffff8191588c>] seq_read+0xec/0x1400 [40581.813603] [40581.813603] but task is already holding lock: [40581.813605] ( [40581.813607] sb_writers [40581.813608] #8 [40581.813609] ){.+.+.+} [40581.813617] , at: [40581.813617] [<ffffffff81889c6a>] do_sendfile+0x9ea/0x1270 [40581.813618] [40581.813618] which lock already depends on the new lock. [40581.813618] [40581.813620] [40581.813620] the existing dependency chain (in reverse order) is: [40581.813623] [40581.813623] -> #3 [40581.813624] ( [40581.813625] sb_writers [40581.813626] #8 [40581.813628] ){.+.+.+} [40581.813628] : [40581.813636] [40581.813636] [<ffffffff8133dcda>] __lock_acquire+0x9aa/0x1710 [40581.813640] [40581.813640] [<ffffffff8133fd4e>] lock_acquire+0x24e/0x5d0 [40581.813644] [40581.813645] [<ffffffff8189037e>] __sb_start_write+0xae/0x360 [40581.813650] [40581.813650] [<ffffffff819066fa>] mnt_want_write+0x4a/0xc0 [40581.813661] [40581.813661] [<ffffffffa16cdfbd>] ovl_want_write+0x8d/0xf0 [overlay] [40581.813668] [40581.813668] [<ffffffffa16d4dc7>] ovl_do_remove+0xe7/0x9a0 [overlay] [40581.813675] [40581.813676] [<ffffffffa16d5696>] ovl_rmdir+0x16/0x20 [overlay] [40581.813680] [40581.813680] [<ffffffff818af90f>] vfs_rmdir+0x1bf/0x3e0 [40581.813685] [40581.813686] [<ffffffff818c5965>] do_rmdir+0x2c5/0x430 [40581.813689] [40581.813690] [<ffffffff818c8242>] SyS_unlinkat+0x22/0x30 [40581.813696] [40581.813696] [<ffffffff8100924d>] do_syscall_64+0x19d/0x540 [40581.813704] [40581.813704] [<ffffffff82c8af24>] return_from_SYSCALL_64+0x0/0x7a [40581.813707] [40581.813707] -> #2 [40581.813709] ( [40581.813710] &sb->s_type->i_mutex_key [40581.813711] #17 [40581.813712] ){++++++} [40581.813713] : [40581.813720] [40581.813720] [<ffffffff8133dcda>] __lock_acquire+0x9aa/0x1710 [40581.813726] [40581.813726] [<ffffffff8133fd4e>] lock_acquire+0x24e/0x5d0 [40581.813736] [40581.813736] [<ffffffff82c84261>] down_read+0xa1/0x1c0 [40581.813740] [40581.813740] [<ffffffff818ae2db>] lookup_slow+0x17b/0x4f0 [40581.813744] [40581.813744] [<ffffffff818bb228>] walk_component+0x728/0x1d10 [40581.813750] [40581.813750] [<ffffffff818bcc1e>] link_path_walk+0x40e/0x1690 [40581.813758] [40581.813758] [<ffffffff818c0274>] path_openat+0x1c4/0x3870 [40581.813764] [40581.813765] [<ffffffff818c6d19>] do_filp_open+0x1a9/0x2e0 [40581.813772] [40581.813772] [<ffffffff8189832b>] do_open_execat+0xcb/0x420 [40581.813783] [40581.813784] [<ffffffff8189932b>] open_exec+0x2b/0x50 [40581.813793] [40581.813793] [<ffffffff819ea78c>] load_elf_binary+0x103c/0x3550 [40581.813807] [40581.813807] [<ffffffff8189a852>] search_binary_handler+0x162/0x480 [40581.813814] [40581.813815] [<ffffffff818a106a>] do_execveat_common.isra.24+0x138a/0x2570 [40581.813823] [40581.813824] [<ffffffff818a2efa>] SyS_execve+0x3a/0x50 [40581.813828] [40581.813828] [<ffffffff8100924d>] do_syscall_64+0x19d/0x540 [40581.813833] [40581.813834] [<ffffffff82c8af24>] return_from_SYSCALL_64+0x0/0x7a [40581.813843] [40581.813843] -> #1 [40581.813845] ( [40581.813850] &sig->cred_guard_mutex [40581.813852] ){+.+.+.} [40581.813852] : [40581.813861] [40581.813862] [<ffffffff8133dcda>] __lock_acquire+0x9aa/0x1710 [40581.813871] [40581.813871] [<ffffffff8133fd4e>] lock_acquire+0x24e/0x5d0 [40581.813885] [40581.813886] [<ffffffff82c7d1d3>] mutex_lock_killable_nested+0x103/0xb90 [40581.813895] [40581.813896] [<ffffffff81a3f7a6>] do_io_accounting+0x186/0xcf0 [40581.813902] [40581.813903] [<ffffffff81a40329>] proc_tgid_io_accounting+0x19/0x20 [40581.813908] [40581.813909] [<ffffffff81a41494>] proc_single_show+0x114/0x1d0 [40581.813917] [40581.813917] [<ffffffff81915ad4>] seq_read+0x334/0x1400 [40581.813921] [40581.813921] [<ffffffff81884da6>] __vfs_read+0x106/0x990 [40581.813927] [40581.813927] [<ffffffff81886038>] vfs_read+0x118/0x400 [40581.813931] [40581.813931] [<ffffffff8188aebf>] SyS_read+0xdf/0x1d0 [40581.813938] [40581.813938] [<ffffffff8100924d>] do_syscall_64+0x19d/0x540 [40581.813945] [40581.813946] [<ffffffff82c8af24>] return_from_SYSCALL_64+0x0/0x7a [40581.813949] [40581.813949] -> #0 [40581.813951] ( [40581.813954] &p->lock [40581.813955] ){+.+.+.} [40581.813956] : [40581.813961] [40581.813961] [<ffffffff81337938>] validate_chain.isra.31+0x2b28/0x4c00 [40581.813965] [40581.813966] [<ffffffff8133dcda>] __lock_acquire+0x9aa/0x1710 [40581.813972] [40581.813972] [<ffffffff8133fd4e>] lock_acquire+0x24e/0x5d0 [40581.813977] [40581.813977] [<ffffffff82c7f2f8>] mutex_lock_nested+0x108/0xa50 [40581.813983] [40581.813983] [<ffffffff8191588c>] seq_read+0xec/0x1400 [40581.813993] [40581.813993] [<ffffffff81a7bdde>] kernfs_fop_read+0x35e/0x640 [40581.813998] [40581.813998] [<ffffffff818812ef>] do_loop_readv_writev+0xdf/0x250 [40581.814003] [40581.814003] [<ffffffff81886fb5>] do_readv_writev+0x6a5/0xab0 [40581.814007] [40581.814007] [<ffffffff81887446>] vfs_readv+0x86/0xe0 [40581.814020] [40581.814020] [<ffffffff8194fdac>] default_file_splice_read+0x49c/0xbb0 [40581.814026] [40581.814027] [<ffffffff8194eb74>] do_splice_to+0x104/0x1a0 [40581.814033] [40581.814033] [<ffffffff8194ee80>] splice_direct_to_actor+0x270/0xa00 [40581.814039] [40581.814039] [<ffffffff8194f7a4>] do_splice_direct+0x194/0x300 [40581.814046] [40581.814046] [<ffffffff818896e9>] do_sendfile+0x469/0x1270 [40581.814051] [40581.814051] [<ffffffff8188bcb0>] SyS_sendfile64+0x140/0x150 [40581.814054] [40581.814055] [<ffffffff8100924d>] do_syscall_64+0x19d/0x540 [40581.814059] [40581.814060] [<ffffffff82c8af24>] return_from_SYSCALL_64+0x0/0x7a [40581.814062] [40581.814062] other info that might help us debug this: [40581.814062] [40581.814066] Chain exists of: [40581.814066] [40581.814067] &p->lock [40581.814069] --> [40581.814070] &sb->s_type->i_mutex_key [40581.814071] #17 [40581.814073] --> [40581.814076] sb_writers [40581.814079] #8 [40581.814079] [40581.814079] [40581.814080] Possible unsafe locking scenario: [40581.814080] [40581.814081] CPU0 CPU1 [40581.814083] ---- ---- [40581.814085] lock( [40581.814088] sb_writers [40581.814089] #8 [40581.814089] ); [40581.814091] lock( [40581.814093] &sb->s_type->i_mutex_key [40581.814095] #17 [40581.814095] ); [40581.814097] lock( [40581.814098] sb_writers [40581.814099] #8 [40581.814099] ); [40581.814101] lock( [40581.814103] &p->lock [40581.814103] ); [40581.814104] [40581.814104] *** DEADLOCK *** [40581.814104] [40581.814106] 1 lock held by trinity-c104/39795: [40581.814109] #0: [40581.814111] ( [40581.814112] sb_writers [40581.814113] #8 [40581.814114] ){.+.+.+} [40581.814116] , at: [40581.814117] [<ffffffff81889c6a>] do_sendfile+0x9ea/0x1270 [40581.814118] [40581.814118] stack backtrace: [40581.814121] CPU: 25 PID: 39795 Comm: trinity-c104 Tainted: G W 4.9.0-rc1-lockfix-uncorev2+ #51 [40581.814123] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015 [40581.814131] ffff880825886da0 ffffffff81d37124 0000000041b58ab3 ffffffff83348dc7 [40581.814138] ffffffff81d37064 0000000000000001 0000000000000000 ffff8807b4d5d5d8 [40581.814145] 00000000bfc018be ffff880825886d78 0000000000000001 0000000000000000 [40581.814146] Call Trace: [40581.814155] [<ffffffff81d37124>] dump_stack+0xc0/0x12c [40581.814159] [<ffffffff81d37064>] ? _atomic_dec_and_lock+0xc4/0xc4 [40581.814168] [<ffffffff81332fa9>] print_circular_bug+0x3c9/0x5e0 [40581.814171] [<ffffffff81332be0>] ? print_circular_bug_entry+0xd0/0xd0 [40581.814176] [<ffffffff81337938>] validate_chain.isra.31+0x2b28/0x4c00 [40581.814182] [<ffffffff81334e10>] ? check_irq_usage+0x300/0x300 [40581.814192] [<ffffffff81334e10>] ? check_irq_usage+0x300/0x300 [40581.814196] [<ffffffff81de21f3>] ? __this_cpu_preempt_check+0x13/0x20 [40581.814200] [<ffffffff81336045>] ? validate_chain.isra.31+0x1235/0x4c00 [40581.814204] [<ffffffff8133a4d0>] ? print_usage_bug+0x700/0x700 [40581.814208] [<ffffffff812abdc0>] ? sched_clock_cpu+0x1b0/0x310 [40581.814214] [<ffffffff8133a4d0>] ? print_usage_bug+0x700/0x700 [40581.814219] [<ffffffff812abdc0>] ? sched_clock_cpu+0x1b0/0x310 [40581.814226] [<ffffffff8133dcda>] __lock_acquire+0x9aa/0x1710 [40581.814232] [<ffffffff8133fd4e>] lock_acquire+0x24e/0x5d0 [40581.814235] [<ffffffff8191588c>] ? seq_read+0xec/0x1400 [40581.814240] [<ffffffff8191588c>] ? seq_read+0xec/0x1400 [40581.814243] [<ffffffff82c7f2f8>] mutex_lock_nested+0x108/0xa50 [40581.814246] [<ffffffff8191588c>] ? seq_read+0xec/0x1400 [40581.814250] [<ffffffff8191588c>] ? seq_read+0xec/0x1400 [40581.814256] [<ffffffff817fedd6>] ? kasan_unpoison_shadow+0x36/0x50 [40581.814259] [<ffffffff82c7f1f0>] ? mutex_lock_interruptible_nested+0xb40/0xb40 [40581.814264] [<ffffffff8168ec2c>] ? get_page_from_freelist+0x175c/0x2ed0 [40581.814271] [<ffffffff8168d4d0>] ? __isolate_free_page+0x7e0/0x7e0 [40581.814275] [<ffffffff8133c3f9>] ? mark_held_locks+0x109/0x290 [40581.814278] [<ffffffff8191588c>] seq_read+0xec/0x1400 [40581.814283] [<ffffffff813ac01d>] ? rcu_lockdep_current_cpu_online+0x11d/0x1d0 [40581.814290] [<ffffffff819157a0>] ? seq_hlist_start_percpu+0x4a0/0x4a0 [40581.814295] [<ffffffff8198ef20>] ? __fsnotify_update_child_dentry_flags.part.0+0x2b0/0x2b0 [40581.814298] [<ffffffff81de21f3>] ? __this_cpu_preempt_check+0x13/0x20 [40581.814300] [<ffffffff81a7bdde>] kernfs_fop_read+0x35e/0x640 [40581.814305] [<ffffffff81b49a55>] ? selinux_file_permission+0x3c5/0x550 [40581.814310] [<ffffffff81a7ba80>] ? kernfs_fop_open+0xf40/0xf40 [40581.814312] [<ffffffff818812ef>] do_loop_readv_writev+0xdf/0x250 [40581.814318] [<ffffffff81886fb5>] do_readv_writev+0x6a5/0xab0 [40581.814324] [<ffffffff81886910>] ? vfs_write+0x5f0/0x5f0 [40581.814328] [<ffffffff81d8fbaf>] ? iov_iter_get_pages_alloc+0x53f/0x1990 [40581.814332] [<ffffffff81d8f670>] ? iov_iter_npages+0xed0/0xed0 [40581.814336] [<ffffffff8133c3f9>] ? mark_held_locks+0x109/0x290 [40581.814339] [<ffffffff81de21f3>] ? __this_cpu_preempt_check+0x13/0x20 [40581.814344] [<ffffffff8133caa0>] ? trace_hardirqs_on_caller+0x520/0x720 [40581.814347] [<ffffffff81887446>] vfs_readv+0x86/0xe0 [40581.814352] [<ffffffff8194fdac>] default_file_splice_read+0x49c/0xbb0 [40581.814361] [<ffffffff8194f910>] ? do_splice_direct+0x300/0x300 [40581.814363] [<ffffffff817fef3d>] ? kasan_kmalloc+0xad/0xe0 [40581.814366] [<ffffffff818a6287>] ? alloc_pipe_info+0x1b7/0x410 [40581.814371] [<ffffffff8133a4d0>] ? print_usage_bug+0x700/0x700 [40581.814373] [<ffffffff8188bcb0>] ? SyS_sendfile64+0x140/0x150 [40581.814377] [<ffffffff8100924d>] ? do_syscall_64+0x19d/0x540 [40581.814380] [<ffffffff82c8af24>] ? entry_SYSCALL64_slow_path+0x25/0x25 [40581.814382] [<ffffffff812abdc0>] ? sched_clock_cpu+0x1b0/0x310 [40581.814386] [<ffffffff8133c3f9>] ? mark_held_locks+0x109/0x290 [40581.814390] [<ffffffff8133caa0>] ? trace_hardirqs_on_caller+0x520/0x720 [40581.814395] [<ffffffff8198ef20>] ? __fsnotify_update_child_dentry_flags.part.0+0x2b0/0x2b0 [40581.814398] [<ffffffff81b49a55>] ? selinux_file_permission+0x3c5/0x550 [40581.814404] [<ffffffff81b26e96>] ? security_file_permission+0x176/0x220 [40581.814408] [<ffffffff81885c78>] ? rw_verify_area+0xd8/0x380 [40581.814411] [<ffffffff8194eb74>] do_splice_to+0x104/0x1a0 [40581.814415] [<ffffffff818a63b7>] ? alloc_pipe_info+0x2e7/0x410 [40581.814419] [<ffffffff8194ee80>] splice_direct_to_actor+0x270/0xa00 [40581.814424] [<ffffffff8194c5e0>] ? wakeup_pipe_readers+0x90/0x90 [40581.814429] [<ffffffff8194ec10>] ? do_splice_to+0x1a0/0x1a0 [40581.814432] [<ffffffff81885c78>] ? rw_verify_area+0xd8/0x380 [40581.814438] [<ffffffff8194f7a4>] do_splice_direct+0x194/0x300 [40581.814443] [<ffffffff8194f610>] ? splice_direct_to_actor+0xa00/0xa00 [40581.814450] [<ffffffff81278cee>] ? preempt_count_sub+0x5e/0xe0 [40581.814452] [<ffffffff81890415>] ? __sb_start_write+0x145/0x360 [40581.814457] [<ffffffff818896e9>] do_sendfile+0x469/0x1270 [40581.814461] [<ffffffff81889280>] ? do_compat_pwritev64.isra.16+0xd0/0xd0 [40581.814466] [<ffffffff814cb287>] ? __audit_syscall_exit+0x637/0x960 [40581.814469] [<ffffffff81006afb>] ? syscall_trace_enter+0x89b/0x1930 [40581.814473] [<ffffffff817f7993>] ? kfree+0x3f3/0x620 [40581.814477] [<ffffffff8188bcb0>] SyS_sendfile64+0x140/0x150 [40581.814479] [<ffffffff8188bb70>] ? SyS_sendfile+0x140/0x140 [40581.814482] [<ffffffff81de21f3>] ? __this_cpu_preempt_check+0x13/0x20 [40581.814485] [<ffffffff8188bb70>] ? SyS_sendfile+0x140/0x140 [40581.814487] [<ffffffff8100924d>] do_syscall_64+0x19d/0x540 [40581.814491] [<ffffffff82c8af24>] entry_SYSCALL64_slow_path+0x25/0x25 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html