On Tue, Oct 24, 2017 at 10:06:06AM -0400, Brian Foster wrote: > On Mon, Oct 23, 2017 at 05:23:34PM -0700, Darrick J. Wong wrote: > > On Mon, Oct 23, 2017 at 12:54:17PM -0400, Brian Foster wrote: > > > On Mon, Oct 23, 2017 at 09:39:40AM -0700, Darrick J. Wong wrote: > > > > On Mon, Oct 23, 2017 at 10:46:45AM -0400, Brian Foster wrote: > > > > > Log recovery of v4 filesystems does not use buffer verifiers because > > > > > log recovery historically can result in transient buffer corruption > > > > > when target buffers might be ahead of the log after a crash. v5 > > > > > filesystems work around this problem with metadata LSN ordering. > > > > > > > > > > While the log recovery behavior is necessary on v4 supers, it > > > > > currently can result in leaving buffers around in the LRU without > > > > > verifiers attached for a significant amount of time. This can lead > > > > > to use of unverified buffers while the filesystem is in active use, > > > > > long after recovery has completed. > > > > > > > > > > To address this problem and provide a more consistent clean, > > > > > post-mount buffer cache state, update the log mount sequence to > > > > > unconditionally drain all buffers from the LRU as a final step. > > > > > > > > > > Reported-by: Darrick Wong <darrick.wong@xxxxxxxxxx> > > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > > > > --- > > > > > fs/xfs/xfs_log.c | 13 +++++++++++++ > > > > > 1 file changed, 13 insertions(+) > > > > > > > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > > > > index dc95a49..e282fd8 100644 > > > > > --- a/fs/xfs/xfs_log.c > > > > > +++ b/fs/xfs/xfs_log.c > > > > > @@ -744,6 +744,7 @@ xfs_log_mount_finish( > > > > > { > > > > > int error = 0; > > > > > bool readonly = (mp->m_flags & XFS_MOUNT_RDONLY); > > > > > + bool recovered = mp->m_log->l_flags & XLOG_RECOVERY_NEEDED; > > > > > > > > > > if (mp->m_flags & XFS_MOUNT_NORECOVERY) { > > > > > ASSERT(mp->m_flags & XFS_MOUNT_RDONLY); > > > > > @@ -780,6 +781,18 @@ xfs_log_mount_finish( > > > > > mp->m_super->s_flags &= ~MS_ACTIVE; > > > > > evict_inodes(mp->m_super); > > > > > > > > > > + /* > > > > > + * Drain the buffer LRU after log recovery. This is required for v4 > > > > > + * filesystems to avoid leaving around buffers with NULL verifier ops, > > > > > + * but we do it unconditionally to make sure we're always in a clean > > > > > + * cache state after mount. > > > > > + */ > > > > > + if (recovered) { > > > > > > > > if (recovered && !error) { ? > > > > > > > > I observed that running xfs/376 on an rmap filesystem fails when it > > > > tries to fuzz the high bit of u3.bmx[0].startoff. That triggers an > > > > incorrect freeing of what is now a post-eof extent. The corresponding > > > > rmap free operation fails after the RUI has been logged and shuts down > > > > the filesystem, so a subsequent log recovery attempt also fails when it > > > > tries to remove an rmap that doesn't exist. If we then try to force the > > > > log we end up deadlocked somehwere... though if we /don't/ then memory > > > > gets corrupted and the kernel blows up anyway. :( > > > > > > > > > > Interesting... do you have a stack trace? I'm curious why forcing the > > > log would hang here and not in the subsequent log force in > > > xfs_log_unmount() -> xfs_log_quiesce(). > > > ... > > ...after which I see this once before the system totally hangs: > > > > [ 5405.641103] INFO: task mount:28033 blocked for more than 60 seconds. > > [ 5405.643982] Tainted: G W 4.14.0-rc6-xfsx #1 > > [ 5405.646067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 5405.650068] mount D12768 28033 27973 0x80000000 > > [ 5405.655505] Call Trace: > > [ 5405.657210] ? __schedule+0x40d/0xb20 > > [ 5405.658865] schedule+0x40/0x90 > > [ 5405.661340] xfs_ail_push_all_sync+0xa4/0xe0 [xfs] > > [ 5405.664348] ? remove_wait_queue+0x60/0x60 > > [ 5405.668815] xfs_log_mount_finish+0xc7/0x110 [xfs] > > [ 5405.671736] xfs_mountfs+0x68f/0xaf0 [xfs] > > [ 5405.674645] xfs_fs_fill_super+0x4d7/0x630 [xfs] > > [ 5405.677673] ? xfs_finish_flags+0x150/0x150 [xfs] > > [ 5405.679966] mount_bdev+0x178/0x1b0 > > [ 5405.681300] mount_fs+0xf/0x80 > > [ 5405.682426] vfs_kern_mount+0x62/0x160 > > [ 5405.683712] do_mount+0x1b1/0xd50 > > [ 5405.687634] ? rcu_read_lock_sched_held+0x3f/0x70 > > [ 5405.688453] SyS_mount+0x85/0xd0 > > [ 5405.689033] entry_SYSCALL_64_fastpath+0x1f/0xbe > > [ 5405.689955] RIP: 0033:0x7f0226c72b5a > > [ 5405.690533] RSP: 002b:00007fff9ddea318 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 > > [ 5405.692074] RAX: ffffffffffffffda RBX: 00007f0226f6b63a RCX: 00007f0226c72b5a > > [ 5405.695604] RDX: 0000000001a3b2d0 RSI: 0000000001a3b310 RDI: 0000000001a3b2f0 > > [ 5405.696566] RBP: 0000000001a3b120 R08: 0000000001a3b270 R09: 0000000000000012 > > [ 5405.699583] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00007f022717b83c > > [ 5405.700465] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000005 > > [ 5405.702569] > > [ 5405.702569] Showing all locks held in the system: > > [ 5405.703485] 1 lock held by khungtaskd/37: > > [ 5405.704095] #0: (tasklist_lock){.+.+}, at: [<ffffffff810dfbf7>] debug_show_all_locks+0x37/0x190 > > [ 5405.705342] 2 locks held by agetty/810: > > [ 5405.705845] #0: (&tty->ldisc_sem){++++}, at: [<ffffffff813fecf4>] tty_ldisc_ref_wait+0x24/0x50 > > [ 5405.706934] #1: (&ldata->atomic_read_lock){+.+.}, at: [<ffffffff813fd034>] n_tty_read+0xc4/0x880 > > [ 5405.708030] 1 lock held by in:imklog/864: > > [ 5405.708534] #0: (&f->f_pos_lock){+.+.}, at: [<ffffffff8125d8d8>] __fdget_pos+0x48/0x60 > > [ 5405.709572] 1 lock held by mount/28033: > > [ 5405.710112] #0: (&type->s_umount_key#45/1){+.+.}, at: [<ffffffff8123c993>] sget_userns+0x303/0x4f0 > > [ 5405.711210] > > [ 5405.711408] ============================================= > > [ 5405.711408] > > > > Thanks. Hmm, when I run xfs/376 on the djwong-devel xfs I see a bunch of > mm/usercopy.c BUG()s and what looks like a mount hang during quotacheck, > but it's blocked down in the block layer in io_schedule() attempting a > readahead. I'm not terribly surprised by the lockup after all the BUG() > invocations, but what I'm seeing doesn't appear to be similar to what > you reproduce or related to the AIL in any way. > > I assume this mount behavior repeats after the test completes. If so, > could you grab a metadump of the fs? > Ok, I can reproduce the problem now that I have the metadump. The AIL push hangs because there's a pinned EFI sitting on the AIL. The reason the subsequent push succeeds is because we cancel the recovery (which cancels the intents) before we get to that point. Given that, I agree that it makes sense to check for an error here. Thanks! Brian > Brian > > > --D > > > > > > > > Brian > > > > > > > --D > > > > > > > > > + xfs_log_force(mp, XFS_LOG_SYNC); > > > > > + xfs_ail_push_all_sync(mp->m_ail); > > > > > + } > > > > > + xfs_wait_buftarg(mp->m_ddev_targp); > > > > > + > > > > > if (readonly) > > > > > mp->m_flags |= XFS_MOUNT_RDONLY; > > > > > > > > > > -- > > > > > 2.9.5 > > > > > > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html