Re: [PATCH 3/4] xfs: drain the buffer LRU on mount

Brian Foster <bfoster@xxxxxxxxxx> · Tue, 24 Oct 2017 10:06:06 -0400

On Mon, Oct 23, 2017 at 05:23:34PM -0700, Darrick J. Wong wrote:
> On Mon, Oct 23, 2017 at 12:54:17PM -0400, Brian Foster wrote:
> > On Mon, Oct 23, 2017 at 09:39:40AM -0700, Darrick J. Wong wrote:
> > > On Mon, Oct 23, 2017 at 10:46:45AM -0400, Brian Foster wrote:
> > > > Log recovery of v4 filesystems does not use buffer verifiers because
> > > > log recovery historically can result in transient buffer corruption
> > > > when target buffers might be ahead of the log after a crash. v5
> > > > filesystems work around this problem with metadata LSN ordering.
> > > > 
> > > > While the log recovery behavior is necessary on v4 supers, it
> > > > currently can result in leaving buffers around in the LRU without
> > > > verifiers attached for a significant amount of time. This can lead
> > > > to use of unverified buffers while the filesystem is in active use,
> > > > long after recovery has completed.
> > > > 
> > > > To address this problem and provide a more consistent clean,
> > > > post-mount buffer cache state, update the log mount sequence to
> > > > unconditionally drain all buffers from the LRU as a final step.
> > > > 
> > > > Reported-by: Darrick Wong <darrick.wong@xxxxxxxxxx>
> > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > > ---
> > > >  fs/xfs/xfs_log.c | 13 +++++++++++++
> > > >  1 file changed, 13 insertions(+)
> > > > 
> > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > > index dc95a49..e282fd8 100644
> > > > --- a/fs/xfs/xfs_log.c
> > > > +++ b/fs/xfs/xfs_log.c
> > > > @@ -744,6 +744,7 @@ xfs_log_mount_finish(
> > > >  {
> > > >  	int	error = 0;
> > > >  	bool	readonly = (mp->m_flags & XFS_MOUNT_RDONLY);
> > > > +	bool	recovered = mp->m_log->l_flags & XLOG_RECOVERY_NEEDED;
> > > >  
> > > >  	if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
> > > >  		ASSERT(mp->m_flags & XFS_MOUNT_RDONLY);
> > > > @@ -780,6 +781,18 @@ xfs_log_mount_finish(
> > > >  	mp->m_super->s_flags &= ~MS_ACTIVE;
> > > >  	evict_inodes(mp->m_super);
> > > >  
> > > > +	/*
> > > > +	 * Drain the buffer LRU after log recovery. This is required for v4
> > > > +	 * filesystems to avoid leaving around buffers with NULL verifier ops,
> > > > +	 * but we do it unconditionally to make sure we're always in a clean
> > > > +	 * cache state after mount.
> > > > +	 */
> > > > +	if (recovered) {
> > > 
> > > if (recovered && !error) { ?
> > > 
> > > I observed that running xfs/376 on an rmap filesystem fails when it
> > > tries to fuzz the high bit of u3.bmx[0].startoff.  That triggers an
> > > incorrect freeing of what is now a post-eof extent.  The corresponding
> > > rmap free operation fails after the RUI has been logged and shuts down
> > > the filesystem, so a subsequent log recovery attempt also fails when it
> > > tries to remove an rmap that doesn't exist.  If we then try to force the
> > > log we end up deadlocked somehwere... though if we /don't/ then memory
> > > gets corrupted and the kernel blows up anyway. :(
> > > 
> > 
> > Interesting... do you have a stack trace? I'm curious why forcing the
> > log would hang here and not in the subsequent log force in
> > xfs_log_unmount() -> xfs_log_quiesce().
> 
...
> ...after which I see this once before the system totally hangs:
> 
> [ 5405.641103] INFO: task mount:28033 blocked for more than 60 seconds.
> [ 5405.643982]       Tainted: G        W       4.14.0-rc6-xfsx #1
> [ 5405.646067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 5405.650068] mount           D12768 28033  27973 0x80000000
> [ 5405.655505] Call Trace:
> [ 5405.657210]  ? __schedule+0x40d/0xb20
> [ 5405.658865]  schedule+0x40/0x90
> [ 5405.661340]  xfs_ail_push_all_sync+0xa4/0xe0 [xfs]
> [ 5405.664348]  ? remove_wait_queue+0x60/0x60
> [ 5405.668815]  xfs_log_mount_finish+0xc7/0x110 [xfs]
> [ 5405.671736]  xfs_mountfs+0x68f/0xaf0 [xfs]
> [ 5405.674645]  xfs_fs_fill_super+0x4d7/0x630 [xfs]
> [ 5405.677673]  ? xfs_finish_flags+0x150/0x150 [xfs]
> [ 5405.679966]  mount_bdev+0x178/0x1b0
> [ 5405.681300]  mount_fs+0xf/0x80
> [ 5405.682426]  vfs_kern_mount+0x62/0x160
> [ 5405.683712]  do_mount+0x1b1/0xd50
> [ 5405.687634]  ? rcu_read_lock_sched_held+0x3f/0x70
> [ 5405.688453]  SyS_mount+0x85/0xd0
> [ 5405.689033]  entry_SYSCALL_64_fastpath+0x1f/0xbe
> [ 5405.689955] RIP: 0033:0x7f0226c72b5a
> [ 5405.690533] RSP: 002b:00007fff9ddea318 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
> [ 5405.692074] RAX: ffffffffffffffda RBX: 00007f0226f6b63a RCX: 00007f0226c72b5a
> [ 5405.695604] RDX: 0000000001a3b2d0 RSI: 0000000001a3b310 RDI: 0000000001a3b2f0
> [ 5405.696566] RBP: 0000000001a3b120 R08: 0000000001a3b270 R09: 0000000000000012
> [ 5405.699583] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00007f022717b83c
> [ 5405.700465] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000005
> [ 5405.702569] 
> [ 5405.702569] Showing all locks held in the system:
> [ 5405.703485] 1 lock held by khungtaskd/37:
> [ 5405.704095]  #0:  (tasklist_lock){.+.+}, at: [<ffffffff810dfbf7>] debug_show_all_locks+0x37/0x190
> [ 5405.705342] 2 locks held by agetty/810:
> [ 5405.705845]  #0:  (&tty->ldisc_sem){++++}, at: [<ffffffff813fecf4>] tty_ldisc_ref_wait+0x24/0x50
> [ 5405.706934]  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<ffffffff813fd034>] n_tty_read+0xc4/0x880
> [ 5405.708030] 1 lock held by in:imklog/864:
> [ 5405.708534]  #0:  (&f->f_pos_lock){+.+.}, at: [<ffffffff8125d8d8>] __fdget_pos+0x48/0x60
> [ 5405.709572] 1 lock held by mount/28033:
> [ 5405.710112]  #0:  (&type->s_umount_key#45/1){+.+.}, at: [<ffffffff8123c993>] sget_userns+0x303/0x4f0
> [ 5405.711210] 
> [ 5405.711408] =============================================
> [ 5405.711408] 
> 

Thanks. Hmm, when I run xfs/376 on the djwong-devel xfs I see a bunch of
mm/usercopy.c BUG()s and what looks like a mount hang during quotacheck,
but it's blocked down in the block layer in io_schedule() attempting a
readahead. I'm not terribly surprised by the lockup after all the BUG()
invocations, but what I'm seeing doesn't appear to be similar to what
you reproduce or related to the AIL in any way.

I assume this mount behavior repeats after the test completes. If so,
could you grab a metadump of the fs?

Brian

> --D
> 
> > 
> > Brian
> > 
> > > --D
> > > 
> > > > +		xfs_log_force(mp, XFS_LOG_SYNC);
> > > > +		xfs_ail_push_all_sync(mp->m_ail);
> > > > +	}
> > > > +	xfs_wait_buftarg(mp->m_ddev_targp);
> > > > +
> > > >  	if (readonly)
> > > >  		mp->m_flags |= XFS_MOUNT_RDONLY;
> > > >  
> > > > -- 
> > > > 2.9.5
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html