Re: [PATCH 2/5] xfs: allocate quotaoff transactions up front to avoid log deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 26, 2017 at 02:23:33PM -0700, Darrick J. Wong wrote:
> On Wed, Feb 15, 2017 at 10:40:44AM -0500, Brian Foster wrote:
> > The quotaoff operation commits two explicitly synchronous
> > transactions to correctly handle log recovery of dquots being
> > modified at the time the quotaoff occurs. The first quotaoff
> > transaction pins the tail of the log with a qoff logitem in the AIL
> > to ensure further logged dquots stay ahead of the quotaoff operation
> > in the log. The second quotaoff_end transaction is committed after
> > the quotaoff operation completes, releases the original log item and
> > thus unpins the log.
> > 
> > Since the first transaction pins the tail of the log, this means a
> > finite amount of space in the log is available between the time a
> > quotaoff starts and completes before transaction reservations have
> > to block on the log. While the quotaoff operation itself consumes no
> > further log space, it is possible for other operations in the
> > filesystem to consume the remaining log space before the quotaoff
> > completes. If this occurs, the quotaoff_end transaction also blocks
> > on the log which prevents the release of the log item and the
> > filesystem deadlocks. This has been reproduced via repeated xfs/305
> > iterations on a vm with fairly limited resources.
> > 
> > To avoid a deadlock due to a particularly slow quotaoff operation,
> > allocate the quotaoff_end transaction immediately after the initial
> > quotaoff transaction is committed. Carry a reference to the
> > transaction through xfs_qm_scall_quotaoff() rather than the qoff log
> > item and commit it once the quotaoff completes.
> 
> Hmm... so instead of holding a pinned log item for however long it takes
> to finish quotaoff, we instead hold a transaction.  That does fix the
> problem where we pin the log tail and the head crashes into it, but also
> comes with its own problems, doesn't it?
> 
> In xfs_qm_scall_quotaoff, we call xfs_qm_dqrele_all_inodes, which
> calls xfs_dqrele_inode to call xfs_qm_dqrele on all the inodes in the
> system.  However, if any of those inodes are unlinked and inactive, the
> iput -> xfs_fs_destroy_inode -> xfs_inactive path truncates the inode,
> which itself requires a transaction, so now we're running with nested
> transactions.
> 
> I'm not sure that's necessarily a problem because the outer transaction
> contains only a single qoffend item.  AFAICT I haven't encountered any
> other places in XFS where a single thread runs nested transactions, but
> it still seems suspect to me.
> 

Yeah, good spot. I wasn't expecting to be logging anything else in this
window of time, but it looks like that is possible. We definitely don't
want to be doing nested transactions. I'll have to think about this one
some more.

Note that this was a one off fix for something I reproduced when trying
to reproduce the livelock addressed by the subsequent patches in the
series. It's not a dependency for the subsequent 3 patches.

> Heh, lockdep just spat this at me:
> 

I'm guessing I just didn't have lockdep enabled when I tested this. To
be sure... what is the reproducer for this? Thanks.

Brian

> [19767.048839] =============================================
> [19767.049699] [ INFO: possible recursive locking detected ]
> [19767.050511] 4.11.0-rc4-djw #3 Not tainted
> [19767.051122] ---------------------------------------------
> [19767.051907] xfs_quota/4152 is trying to acquire lock:
> [19767.052617]  (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
> [19767.054151] 
> [19767.054151] but task is already holding lock:
> [19767.055246]  (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
> [19767.056897] 
> [19767.056897] other info that might help us debug this:
> [19767.057980]  Possible unsafe locking scenario:
> [19767.057980] 
> [19767.058967]        CPU0
> [19767.059385]        ----
> [19767.060339]   lock(sb_internal);
> [19767.061187]   lock(sb_internal);
> [19767.061771] 
> [19767.061771]  *** DEADLOCK ***
> [19767.061771] 
> [19767.062818]  May be due to missing lock nesting notation
> [19767.062818] 
> [19767.063967] 3 locks held by xfs_quota/4152:
> [19767.064521]  #0:  (&type->s_umount_key#34){++++++}, at: [<ffffffff812287f0>] __get_super.part.18+0xc0/0xe0
> [19767.065805]  #1:  (&qinf->qi_quotaofflock){+.+.+.}, at: [<ffffffffa0182a53>] xfs_qm_scall_quotaoff+0x53/0x6c0 [xfs]
> [19767.067226]  #2:  (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
> [19767.068543] 
> [19767.068543] stack backtrace:
> [19767.069191] CPU: 2 PID: 4152 Comm: xfs_quota Not tainted 4.11.0-rc4-djw #3
> [19767.070124] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [19767.071405] Call Trace:
> [19767.071798]  dump_stack+0x85/0xc2
> [19767.072298]  __lock_acquire+0x164e/0x16a0
> [19767.072866]  ? kvm_sched_clock_read+0x9/0x20
> [19767.073460]  ? sched_clock+0x9/0x10
> [19767.073979]  lock_acquire+0xbf/0x210
> [19767.074532]  ? xfs_trans_alloc+0xe3/0x130 [xfs]
> [19767.075166]  __sb_start_write+0xc0/0x220
> [19767.075782]  ? xfs_trans_alloc+0xe3/0x130 [xfs]
> [19767.076475]  xfs_trans_alloc+0xe3/0x130 [xfs]
> [19767.077138]  xfs_inactive_truncate+0x31/0x130 [xfs]
> [19767.077862]  ? xfs_qm_dqattach+0x16/0x50 [xfs]
> [19767.078524]  xfs_inactive+0x17d/0x270 [xfs]
> [19767.079141]  xfs_fs_destroy_inode+0xb3/0x350 [xfs]
> [19767.079964]  destroy_inode+0x3b/0x60
> [19767.080679]  evict+0x132/0x190
> [19767.081301]  iput+0x243/0x320
> [19767.081961]  xfs_inode_ag_walk+0x2a8/0x690 [xfs]
> [19767.082757]  ? xfs_inode_ag_walk+0x92/0x690 [xfs]
> [19767.083545]  ? xfs_trans_free_dqinfo+0x30/0x30 [xfs]
> [19767.084279]  ? xfs_perag_get+0xa8/0x2b0 [xfs]
> [19767.084753]  ? xfs_perag_get+0xa8/0x2b0 [xfs]
> [19767.085234]  xfs_inode_ag_iterator_flags+0x69/0xa0 [xfs]
> [19767.086251]  ? xfs_trans_free_dqinfo+0x30/0x30 [xfs]
> [19767.087219]  xfs_qm_dqrele_all_inodes+0x36/0x60 [xfs]
> [19767.088431]  xfs_qm_scall_quotaoff+0x100/0x6c0 [xfs]
> [19767.089285]  xfs_quota_disable+0x3d/0x50 [xfs]
> [19767.090138]  SyS_quotactl+0x3ff/0x820
> [19767.090853]  ? SYSC_newstat+0x3b/0x50
> [19767.091501]  ? trace_hardirqs_on_caller+0x129/0x1b0
> [19767.092398]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> [19767.093238]  entry_SYSCALL_64_fastpath+0x1f/0xc2
> [19767.094150] RIP: 0033:0x7f43530a70ca
> [19767.094851] RSP: 002b:00007ffe7570fbe8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3
> [19767.096078] RAX: ffffffffffffffda RBX: 00007ffe7570fde8 RCX: 00007f43530a70ca
> [19767.097242] RDX: 0000000000000000 RSI: 00000000012d26f0 RDI: 0000000000580202
> [19767.098402] RBP: 0000000000000005 R08: 00007ffe7570fbfc R09: 0000000000005802
> [19767.099645] R10: 00007ffe7570fbfc R11: 0000000000000206 R12: 0000000000401e50
> [19767.100841] R13: 00007ffe7570fde0 R14: 0000000000000000 R15: 0000000000000000
> 
> --D
> 
> > 
> > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_qm_syscalls.c | 42 ++++++++++++++++++++++++------------------
> >  1 file changed, 24 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> > index 475a388..dbb6802 100644
> > --- a/fs/xfs/xfs_qm_syscalls.c
> > +++ b/fs/xfs/xfs_qm_syscalls.c
> > @@ -35,9 +35,11 @@
> >  #include "xfs_trace.h"
> >  #include "xfs_icache.h"
> >  
> > -STATIC int	xfs_qm_log_quotaoff(xfs_mount_t *, xfs_qoff_logitem_t **, uint);
> > -STATIC int	xfs_qm_log_quotaoff_end(xfs_mount_t *, xfs_qoff_logitem_t *,
> > -					uint);
> > +STATIC int	xfs_qm_log_quotaoff(struct xfs_mount *, struct xfs_trans **,
> > +				    uint);
> > +STATIC int	xfs_qm_log_quotaoff_end(struct xfs_mount *,
> > +					struct xfs_qoff_logitem *,
> > +					struct xfs_trans **, uint);
> >  
> >  /*
> >   * Turn off quota accounting and/or enforcement for all udquots and/or
> > @@ -56,7 +58,7 @@ xfs_qm_scall_quotaoff(
> >  	uint			dqtype;
> >  	int			error;
> >  	uint			inactivate_flags;
> > -	xfs_qoff_logitem_t	*qoffstart;
> > +	struct xfs_trans	*qend_tp;
> >  
> >  	/*
> >  	 * No file system can have quotas enabled on disk but not in core.
> > @@ -128,7 +130,7 @@ xfs_qm_scall_quotaoff(
> >  	 * and synchronously. If we fail to write, we should abort the
> >  	 * operation as it cannot be recovered safely if we crash.
> >  	 */
> > -	error = xfs_qm_log_quotaoff(mp, &qoffstart, flags);
> > +	error = xfs_qm_log_quotaoff(mp, &qend_tp, flags);
> >  	if (error)
> >  		goto out_unlock;
> >  
> > @@ -181,7 +183,7 @@ xfs_qm_scall_quotaoff(
> >  	 * So, we have QUOTAOFF start and end logitems; the start
> >  	 * logitem won't get overwritten until the end logitem appears...
> >  	 */
> > -	error = xfs_qm_log_quotaoff_end(mp, qoffstart, flags);
> > +	error = xfs_trans_commit(qend_tp);
> >  	if (error) {
> >  		/* We're screwed now. Shutdown is the only option. */
> >  		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > @@ -556,13 +558,14 @@ xfs_qm_scall_setqlim(
> >  
> >  STATIC int
> >  xfs_qm_log_quotaoff_end(
> > -	xfs_mount_t		*mp,
> > -	xfs_qoff_logitem_t	*startqoff,
> > +	struct xfs_mount	*mp,
> > +	struct xfs_qoff_logitem	*startqoff,
> > +	struct xfs_trans	**tpp,
> >  	uint			flags)
> >  {
> > -	xfs_trans_t		*tp;
> > +	struct xfs_trans	*tp;
> >  	int			error;
> > -	xfs_qoff_logitem_t	*qoffi;
> > +	struct xfs_qoff_logitem	*qoffi;
> >  
> >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
> >  	if (error)
> > @@ -578,21 +581,22 @@ xfs_qm_log_quotaoff_end(
> >  	 * We don't care about quotoff's performance.
> >  	 */
> >  	xfs_trans_set_sync(tp);
> > -	return xfs_trans_commit(tp);
> > +	*tpp = tp;
> > +	return 0;
> >  }
> >  
> >  
> >  STATIC int
> >  xfs_qm_log_quotaoff(
> > -	xfs_mount_t	       *mp,
> > -	xfs_qoff_logitem_t     **qoffstartp,
> > -	uint		       flags)
> > +	struct xfs_mount	*mp,
> > +	struct xfs_trans	**end_tp,
> > +	uint			flags)
> >  {
> > -	xfs_trans_t	       *tp;
> > +	struct xfs_trans	*tp;
> >  	int			error;
> > -	xfs_qoff_logitem_t     *qoffi;
> > +	struct xfs_qoff_logitem	*qoffi;
> >  
> > -	*qoffstartp = NULL;
> > +	*end_tp = NULL;
> >  
> >  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
> >  	if (error)
> > @@ -617,7 +621,9 @@ xfs_qm_log_quotaoff(
> >  	if (error)
> >  		goto out;
> >  
> > -	*qoffstartp = qoffi;
> > +	error = xfs_qm_log_quotaoff_end(mp, qoffi, end_tp, flags);
> > +	if (error)
> > +		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> >  out:
> >  	return error;
> >  }
> > -- 
> > 2.7.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux