Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone

Chandra Seetharaman <sekharan@xxxxxxxxxx> · Fri, 24 May 2013 16:41:11 -0500



On Fri, 2013-05-24 at 09:42 +1000, Dave Chinner wrote:
> On Thu, May 23, 2013 at 01:09:02PM -0500, Chandra Seetharaman wrote:
> > On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> > > On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > > > Hello,
> > > > 
> > > > While testing and rearranging my pquota/gquota code, I stumbled on a
> > > > xfs_shutdown() during a mount. But the mount just hung.
> > > > 
> > > > I debugged and found that it is in a code path where
> > > > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > > > down the same semaphore is being acquired in write mode causing a
> > > > deadlock.
> > > > 
> > > > This is the stack:
> > > > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> > > >   xlog_print_tic_res
> > > >     xfs_force_shutdown
> > > >       xfs_log_force_umount
> > > >         xlog_cil_force
> > > >           xlog_cil_force_lsn
> > > >             xlog_cil_push_foreground
> > > >               xlog_cil_push - tries to acquire same semaphore in write mode
> > > 
> > > Which means you had a transaction reservation overrun. Is it
> > > reproducable? iDo you have the output from xlog_print_tic_res()?
> > > Because:
> > 
> > Here it is:
> > 
> > May 23 10:48:52 test46 kernel: [   77.500728] XFS (sdh8): xlog_write: reservation summary:
> > May 23 10:48:52 test46 kernel: [   77.500728]   trans type  = QM_SBCHANGE (26)
> > May 23 10:48:52 test46 kernel: [   77.500728]   unit res    = 2740 bytes
> > May 23 10:48:52 test46 kernel: [   77.500728]   current res = -48 bytes
> > May 23 10:48:52 test46 kernel: [   77.500728]   total reg   = 0 bytes (o/flow = 0 bytes)
> > May 23 10:48:52 test46 kernel: [   77.500728]   ophdrs      = 0 (ophdr space = 0 bytes)
> > May 23 10:48:52 test46 kernel: [   77.500728]   ophdr + reg = 0 bytes
> > May 23 10:48:52 test46 kernel: [   77.500728]   num regions = 0
> > May 23 10:48:52 test46 kernel: [   77.500728]
> > 
> > Yes. I can readily reproduce the problem, but it is with my mangled up
> > patchsets :). There is a small change that makes this problem reproduce
> > consistently.
> 
> Interesting. That implies that the CIL stole the reservation for the
> checkpoint headers from this reservation, and then it overran by 48
> bytes. An increase in the number of quotas should not affect this.
> 
> What is the xfs_info output on the filesystem that is triggering
> this?

I have the same set of patches, but it is not happening any more :(. I
will keep trying.
> 
> Cheers,
> 
> Dave.


_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs