Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 24 May 2013 09:42:14 +1000

On Thu, May 23, 2013 at 01:09:02PM -0500, Chandra Seetharaman wrote:
> On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> > On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > > Hello,
> > > 
> > > While testing and rearranging my pquota/gquota code, I stumbled on a
> > > xfs_shutdown() during a mount. But the mount just hung.
> > > 
> > > I debugged and found that it is in a code path where
> > > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > > down the same semaphore is being acquired in write mode causing a
> > > deadlock.
> > > 
> > > This is the stack:
> > > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> > >   xlog_print_tic_res
> > >     xfs_force_shutdown
> > >       xfs_log_force_umount
> > >         xlog_cil_force
> > >           xlog_cil_force_lsn
> > >             xlog_cil_push_foreground
> > >               xlog_cil_push - tries to acquire same semaphore in write mode
> > 
> > Which means you had a transaction reservation overrun. Is it
> > reproducable? iDo you have the output from xlog_print_tic_res()?
> > Because:
> 
> Here it is:
> 
> May 23 10:48:52 test46 kernel: [   77.500728] XFS (sdh8): xlog_write: reservation summary:
> May 23 10:48:52 test46 kernel: [   77.500728]   trans type  = QM_SBCHANGE (26)
> May 23 10:48:52 test46 kernel: [   77.500728]   unit res    = 2740 bytes
> May 23 10:48:52 test46 kernel: [   77.500728]   current res = -48 bytes
> May 23 10:48:52 test46 kernel: [   77.500728]   total reg   = 0 bytes (o/flow = 0 bytes)
> May 23 10:48:52 test46 kernel: [   77.500728]   ophdrs      = 0 (ophdr space = 0 bytes)
> May 23 10:48:52 test46 kernel: [   77.500728]   ophdr + reg = 0 bytes
> May 23 10:48:52 test46 kernel: [   77.500728]   num regions = 0
> May 23 10:48:52 test46 kernel: [   77.500728]
> 
> Yes. I can readily reproduce the problem, but it is with my mangled up
> patchsets :). There is a small change that makes this problem reproduce
> consistently.

Interesting. That implies that the CIL stole the reservation for the
checkpoint headers from this reservation, and then it overran by 48
bytes. An increase in the number of quotas should not affect this.

What is the xfs_info output on the filesystem that is triggering
this?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs