Re: [PATCH 2/5] xfs: allocate quotaoff transactions up front to avoid log deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 27, 2017 at 08:03:27AM -0400, Brian Foster wrote:
> On Wed, Apr 26, 2017 at 02:23:33PM -0700, Darrick J. Wong wrote:
> > On Wed, Feb 15, 2017 at 10:40:44AM -0500, Brian Foster wrote:
> > > The quotaoff operation commits two explicitly synchronous
> > > transactions to correctly handle log recovery of dquots being
> > > modified at the time the quotaoff occurs. The first quotaoff
> > > transaction pins the tail of the log with a qoff logitem in the AIL
> > > to ensure further logged dquots stay ahead of the quotaoff operation
> > > in the log. The second quotaoff_end transaction is committed after
> > > the quotaoff operation completes, releases the original log item and
> > > thus unpins the log.
> > > 
> > > Since the first transaction pins the tail of the log, this means a
> > > finite amount of space in the log is available between the time a
> > > quotaoff starts and completes before transaction reservations have
> > > to block on the log. While the quotaoff operation itself consumes no
> > > further log space, it is possible for other operations in the
> > > filesystem to consume the remaining log space before the quotaoff
> > > completes. If this occurs, the quotaoff_end transaction also blocks
> > > on the log which prevents the release of the log item and the
> > > filesystem deadlocks. This has been reproduced via repeated xfs/305
> > > iterations on a vm with fairly limited resources.
> > > 
> > > To avoid a deadlock due to a particularly slow quotaoff operation,
> > > allocate the quotaoff_end transaction immediately after the initial
> > > quotaoff transaction is committed. Carry a reference to the
> > > transaction through xfs_qm_scall_quotaoff() rather than the qoff log
> > > item and commit it once the quotaoff completes.
> > 
> > Hmm... so instead of holding a pinned log item for however long it takes
> > to finish quotaoff, we instead hold a transaction.  That does fix the
> > problem where we pin the log tail and the head crashes into it, but also
> > comes with its own problems, doesn't it?
> > 
> > In xfs_qm_scall_quotaoff, we call xfs_qm_dqrele_all_inodes, which
> > calls xfs_dqrele_inode to call xfs_qm_dqrele on all the inodes in the
> > system.  However, if any of those inodes are unlinked and inactive, the
> > iput -> xfs_fs_destroy_inode -> xfs_inactive path truncates the inode,
> > which itself requires a transaction, so now we're running with nested
> > transactions.
> > 
> > I'm not sure that's necessarily a problem because the outer transaction
> > contains only a single qoffend item.  AFAICT I haven't encountered any
> > other places in XFS where a single thread runs nested transactions, but
> > it still seems suspect to me.
> > 
> 
> Yeah, good spot. I wasn't expecting to be logging anything else in this
> window of time, but it looks like that is possible. We definitely don't
> want to be doing nested transactions. I'll have to think about this one
> some more.
> 
> Note that this was a one off fix for something I reproduced when trying
> to reproduce the livelock addressed by the subsequent patches in the
> series. It's not a dependency for the subsequent 3 patches.

Oh, ok.  Thank you for pointing that out, as I didn't see a strong
connection between this patch and the next three, and was wondering if
there was one.

> > Heh, lockdep just spat this at me:
> > 
> 
> I'm guessing I just didn't have lockdep enabled when I tested this. To
> be sure... what is the reproducer for this? Thanks.

Running xfs/305 in a loop.

--D

> 
> Brian
> 
> > [19767.048839] =============================================
> > [19767.049699] [ INFO: possible recursive locking detected ]
> > [19767.050511] 4.11.0-rc4-djw #3 Not tainted
> > [19767.051122] ---------------------------------------------
> > [19767.051907] xfs_quota/4152 is trying to acquire lock:
> > [19767.052617]  (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.054151] 
> > [19767.054151] but task is already holding lock:
> > [19767.055246]  (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.056897] 
> > [19767.056897] other info that might help us debug this:
> > [19767.057980]  Possible unsafe locking scenario:
> > [19767.057980] 
> > [19767.058967]        CPU0
> > [19767.059385]        ----
> > [19767.060339]   lock(sb_internal);
> > [19767.061187]   lock(sb_internal);
> > [19767.061771] 
> > [19767.061771]  *** DEADLOCK ***
> > [19767.061771] 
> > [19767.062818]  May be due to missing lock nesting notation
> > [19767.062818] 
> > [19767.063967] 3 locks held by xfs_quota/4152:
> > [19767.064521]  #0:  (&type->s_umount_key#34){++++++}, at: [<ffffffff812287f0>] __get_super.part.18+0xc0/0xe0
> > [19767.065805]  #1:  (&qinf->qi_quotaofflock){+.+.+.}, at: [<ffffffffa0182a53>] xfs_qm_scall_quotaoff+0x53/0x6c0 [xfs]
> > [19767.067226]  #2:  (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.068543] 
> > [19767.068543] stack backtrace:
> > [19767.069191] CPU: 2 PID: 4152 Comm: xfs_quota Not tainted 4.11.0-rc4-djw #3
> > [19767.070124] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > [19767.071405] Call Trace:
> > [19767.071798]  dump_stack+0x85/0xc2
> > [19767.072298]  __lock_acquire+0x164e/0x16a0
> > [19767.072866]  ? kvm_sched_clock_read+0x9/0x20
> > [19767.073460]  ? sched_clock+0x9/0x10
> > [19767.073979]  lock_acquire+0xbf/0x210
> > [19767.074532]  ? xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.075166]  __sb_start_write+0xc0/0x220
> > [19767.075782]  ? xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.076475]  xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.077138]  xfs_inactive_truncate+0x31/0x130 [xfs]
> > [19767.077862]  ? xfs_qm_dqattach+0x16/0x50 [xfs]
> > [19767.078524]  xfs_inactive+0x17d/0x270 [xfs]
> > [19767.079141]  xfs_fs_destroy_inode+0xb3/0x350 [xfs]
> > [19767.079964]  destroy_inode+0x3b/0x60
> > [19767.080679]  evict+0x132/0x190
> > [19767.081301]  iput+0x243/0x320
> > [19767.081961]  xfs_inode_ag_walk+0x2a8/0x690 [xfs]
> > [19767.082757]  ? xfs_inode_ag_walk+0x92/0x690 [xfs]
> > [19767.083545]  ? xfs_trans_free_dqinfo+0x30/0x30 [xfs]
> > [19767.084279]  ? xfs_perag_get+0xa8/0x2b0 [xfs]
> > [19767.084753]  ? xfs_perag_get+0xa8/0x2b0 [xfs]
> > [19767.085234]  xfs_inode_ag_iterator_flags+0x69/0xa0 [xfs]
> > [19767.086251]  ? xfs_trans_free_dqinfo+0x30/0x30 [xfs]
> > [19767.087219]  xfs_qm_dqrele_all_inodes+0x36/0x60 [xfs]
> > [19767.088431]  xfs_qm_scall_quotaoff+0x100/0x6c0 [xfs]
> > [19767.089285]  xfs_quota_disable+0x3d/0x50 [xfs]
> > [19767.090138]  SyS_quotactl+0x3ff/0x820
> > [19767.090853]  ? SYSC_newstat+0x3b/0x50
> > [19767.091501]  ? trace_hardirqs_on_caller+0x129/0x1b0
> > [19767.092398]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> > [19767.093238]  entry_SYSCALL_64_fastpath+0x1f/0xc2
> > [19767.094150] RIP: 0033:0x7f43530a70ca
> > [19767.094851] RSP: 002b:00007ffe7570fbe8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3
> > [19767.096078] RAX: ffffffffffffffda RBX: 00007ffe7570fde8 RCX: 00007f43530a70ca
> > [19767.097242] RDX: 0000000000000000 RSI: 00000000012d26f0 RDI: 0000000000580202
> > [19767.098402] RBP: 0000000000000005 R08: 00007ffe7570fbfc R09: 0000000000005802
> > [19767.099645] R10: 00007ffe7570fbfc R11: 0000000000000206 R12: 0000000000401e50
> > [19767.100841] R13: 00007ffe7570fde0 R14: 0000000000000000 R15: 0000000000000000
> > 
> > --D
> > 
> > > 
> > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/xfs_qm_syscalls.c | 42 ++++++++++++++++++++++++------------------
> > >  1 file changed, 24 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> > > index 475a388..dbb6802 100644
> > > --- a/fs/xfs/xfs_qm_syscalls.c
> > > +++ b/fs/xfs/xfs_qm_syscalls.c
> > > @@ -35,9 +35,11 @@
> > >  #include "xfs_trace.h"
> > >  #include "xfs_icache.h"
> > >  
> > > -STATIC int	xfs_qm_log_quotaoff(xfs_mount_t *, xfs_qoff_logitem_t **, uint);
> > > -STATIC int	xfs_qm_log_quotaoff_end(xfs_mount_t *, xfs_qoff_logitem_t *,
> > > -					uint);
> > > +STATIC int	xfs_qm_log_quotaoff(struct xfs_mount *, struct xfs_trans **,
> > > +				    uint);
> > > +STATIC int	xfs_qm_log_quotaoff_end(struct xfs_mount *,
> > > +					struct xfs_qoff_logitem *,
> > > +					struct xfs_trans **, uint);
> > >  
> > >  /*
> > >   * Turn off quota accounting and/or enforcement for all udquots and/or
> > > @@ -56,7 +58,7 @@ xfs_qm_scall_quotaoff(
> > >  	uint			dqtype;
> > >  	int			error;
> > >  	uint			inactivate_flags;
> > > -	xfs_qoff_logitem_t	*qoffstart;
> > > +	struct xfs_trans	*qend_tp;
> > >  
> > >  	/*
> > >  	 * No file system can have quotas enabled on disk but not in core.
> > > @@ -128,7 +130,7 @@ xfs_qm_scall_quotaoff(
> > >  	 * and synchronously. If we fail to write, we should abort the
> > >  	 * operation as it cannot be recovered safely if we crash.
> > >  	 */
> > > -	error = xfs_qm_log_quotaoff(mp, &qoffstart, flags);
> > > +	error = xfs_qm_log_quotaoff(mp, &qend_tp, flags);
> > >  	if (error)
> > >  		goto out_unlock;
> > >  
> > > @@ -181,7 +183,7 @@ xfs_qm_scall_quotaoff(
> > >  	 * So, we have QUOTAOFF start and end logitems; the start
> > >  	 * logitem won't get overwritten until the end logitem appears...
> > >  	 */
> > > -	error = xfs_qm_log_quotaoff_end(mp, qoffstart, flags);
> > > +	error = xfs_trans_commit(qend_tp);
> > >  	if (error) {
> > >  		/* We're screwed now. Shutdown is the only option. */
> > >  		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > @@ -556,13 +558,14 @@ xfs_qm_scall_setqlim(
> > >  
> > >  STATIC int
> > >  xfs_qm_log_quotaoff_end(
> > > -	xfs_mount_t		*mp,
> > > -	xfs_qoff_logitem_t	*startqoff,
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_qoff_logitem	*startqoff,
> > > +	struct xfs_trans	**tpp,
> > >  	uint			flags)
> > >  {
> > > -	xfs_trans_t		*tp;
> > > +	struct xfs_trans	*tp;
> > >  	int			error;
> > > -	xfs_qoff_logitem_t	*qoffi;
> > > +	struct xfs_qoff_logitem	*qoffi;
> > >  
> > >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
> > >  	if (error)
> > > @@ -578,21 +581,22 @@ xfs_qm_log_quotaoff_end(
> > >  	 * We don't care about quotoff's performance.
> > >  	 */
> > >  	xfs_trans_set_sync(tp);
> > > -	return xfs_trans_commit(tp);
> > > +	*tpp = tp;
> > > +	return 0;
> > >  }
> > >  
> > >  
> > >  STATIC int
> > >  xfs_qm_log_quotaoff(
> > > -	xfs_mount_t	       *mp,
> > > -	xfs_qoff_logitem_t     **qoffstartp,
> > > -	uint		       flags)
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_trans	**end_tp,
> > > +	uint			flags)
> > >  {
> > > -	xfs_trans_t	       *tp;
> > > +	struct xfs_trans	*tp;
> > >  	int			error;
> > > -	xfs_qoff_logitem_t     *qoffi;
> > > +	struct xfs_qoff_logitem	*qoffi;
> > >  
> > > -	*qoffstartp = NULL;
> > > +	*end_tp = NULL;
> > >  
> > >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
> > >  	if (error)
> > > @@ -617,7 +621,9 @@ xfs_qm_log_quotaoff(
> > >  	if (error)
> > >  		goto out;
> > >  
> > > -	*qoffstartp = qoffi;
> > > +	error = xfs_qm_log_quotaoff_end(mp, qoffi, end_tp, flags);
> > > +	if (error)
> > > +		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > >  out:
> > >  	return error;
> > >  }
> > > -- 
> > > 2.7.4
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux