On Fri, Oct 21, 2016 at 5:38 PM, CAI Qian <caiqian@xxxxxxxxxx> wrote: > > ----- Original Message ----- >> From: "Jan Kara" <jack@xxxxxxx> >> Sent: Friday, October 7, 2016 3:08:38 AM >> Subject: Re: local DoS - systemd hang or timeout (WAS: Re: [RFC][CFT] splice_read reworked) >> >> >> So I believe this may be just a problem in overlayfs lockdep annotation >> (see below). Added Miklos to CC. >> >> > Wait. There is also a lockep happened before the xfs internal error as >> > well. >> > >> > [ 5839.452325] ====================================================== >> > [ 5839.459221] [ INFO: possible circular locking dependency detected ] >> > [ 5839.466215] 4.8.0-rc8-splice-fixw-proc+ #4 Not tainted >> > [ 5839.471945] ------------------------------------------------------- >> > [ 5839.478937] trinity-c220/69531 is trying to acquire lock: >> > [ 5839.484961] (&p->lock){+.+.+.}, at: [<ffffffff812ac69c>] >> > seq_read+0x4c/0x3e0 >> > [ 5839.492967] >> > but task is already holding lock: >> > [ 5839.499476] (sb_writers#8){.+.+.+}, at: [<ffffffff81284be1>] >> > __sb_start_write+0xd1/0xf0 >> > [ 5839.508560] >> > which lock already depends on the new lock. >> > >> > [ 5839.517686] >> > the existing dependency chain (in reverse order) is: >> > [ 5839.526036] >> > -> #3 (sb_writers#8){.+.+.+}: >> > [ 5839.530751] [<ffffffff810ff174>] lock_acquire+0xd4/0x240 >> > [ 5839.537368] [<ffffffff810f8f4a>] percpu_down_read+0x4a/0x90 >> > [ 5839.544275] [<ffffffff81284be1>] __sb_start_write+0xd1/0xf0 >> > [ 5839.551181] [<ffffffff812a8544>] mnt_want_write+0x24/0x50 >> > [ 5839.557892] [<ffffffffa04a398f>] ovl_want_write+0x1f/0x30 >> > [overlay] >> > [ 5839.565577] [<ffffffffa04a6036>] ovl_do_remove+0x46/0x480 >> > [overlay] >> > [ 5839.573259] [<ffffffffa04a64a3>] ovl_unlink+0x13/0x20 [overlay] >> > [ 5839.580555] [<ffffffff812918ea>] vfs_unlink+0xda/0x190 >> > [ 5839.586979] [<ffffffff81293698>] do_unlinkat+0x268/0x2b0 >> > [ 5839.593599] [<ffffffff8129419b>] SyS_unlinkat+0x1b/0x30 >> > [ 5839.600120] [<ffffffff81003c9c>] do_syscall_64+0x6c/0x1e0 >> > [ 5839.606836] [<ffffffff817d4a3f>] return_from_SYSCALL_64+0x0/0x7a >> > [ 5839.614231] >> >> So here is IMO the real culprit: do_unlinkat() grabs fs freeze protection >> through mnt_want_write(), we grab also i_rwsem in do_unlinkat() in >> I_MUTEX_PARENT class a bit after that and further down in vfs_unlink() we >> grab i_rwsem for the unlinked inode itself in default I_MUTEX class. Then >> in ovl_want_write() we grab freeze protection again, but this time for the >> upper filesystem. That establishes sb_writers (overlay) -> I_MUTEX_PARENT >> (overlay) -> I_MUTEX (overlay) -> sb_writers (FS-A) lock ordering >> (we maintain locking classes per fs type so that's why I'm showing fs type >> in parenthesis). >> >> Now this nesting is nasty because once you add locks that are not tracked >> per fs type into the mix, you get cycles. In this case we've got >> seq_file->lock and cred_guard_mutex into the mix - the splice path is >> doing sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex (splicing >> from seq_file into the real filesystem). Exec path further establishes >> cred_guard_mutex -> I_MUTEX (overlay) which closes the full cycle: >> >> sb_writers (FS-A) -> seq_file->lock -> cred_guard_mutex -> i_mutex >> (overlay) -> sb_writers (FS-A) >> >> If I analyzed the lockdep trace, this looks like a real (although remote) >> deadlock possibility. Miklos? Yeah, you can leave out seq_file->lock, the chain can be made up from just 3 parts: unlink : i_mutex(ov) -> sb_writers(fs-a) splice: sb_writers(fs-a) ->cred_guard_mutex (though proc_tgid_io_accounting) exec: cred_guard_mutex -> i_mutex(ov) None of those are incorrect, but the cred_guard_mutex usage is also pretty weird: taken outside path lookup as well as inside ->read() in proc. Doesn't look a serious worry in practice, I don't think anybody would trigger the actual deadlock in a 1000years (an fs freeze is needed at just the right moment in addition to the above, very unlikely chain). Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html