On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote: > On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote: > > Hello, > > > > While testing and rearranging my pquota/gquota code, I stumbled on a > > xfs_shutdown() during a mount. But the mount just hung. > > > > I debugged and found that it is in a code path where > > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels > > down the same semaphore is being acquired in write mode causing a > > deadlock. > > > > This is the stack: > > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode > > xlog_print_tic_res > > xfs_force_shutdown > > xfs_log_force_umount > > xlog_cil_force > > xlog_cil_force_lsn > > xlog_cil_push_foreground > > xlog_cil_push - tries to acquire same semaphore in write mode > > Which means you had a transaction reservation overrun. Is it > reproducable? iDo you have the output from xlog_print_tic_res()? > Because: Here it is: May 23 10:48:52 test46 kernel: [ 77.500728] XFS (sdh8): xlog_write: reservation summary: May 23 10:48:52 test46 kernel: [ 77.500728] trans type = QM_SBCHANGE (26) May 23 10:48:52 test46 kernel: [ 77.500728] unit res = 2740 bytes May 23 10:48:52 test46 kernel: [ 77.500728] current res = -48 bytes May 23 10:48:52 test46 kernel: [ 77.500728] total reg = 0 bytes (o/flow = 0 bytes) May 23 10:48:52 test46 kernel: [ 77.500728] ophdrs = 0 (ophdr space = 0 bytes) May 23 10:48:52 test46 kernel: [ 77.500728] ophdr + reg = 0 bytes May 23 10:48:52 test46 kernel: [ 77.500728] num regions = 0 May 23 10:48:52 test46 kernel: [ 77.500728] Yes. I can readily reproduce the problem, but it is with my mangled up patchsets :). There is a small change that makes this problem reproduce consistently. > > > xfs_trans_commit+0x79/0x270 [xfs] > > xfs_qm_write_sb_changes+0x61/0x90 [xfs] > > xfs_qm_mount_quotas+0x82/0x180 [xfs] > > xfs_mountfs+0x5f6/0x6b0 [xfs] > > This transaction only modifies the superblock, and it has a buffer > reservation for a superblock sized buffer, and hence should never > overrun. > > IOWs, I'm ifar more concerned about the fact there was a > transaction overrun than they was a hang in the path that handles As I mentioned above, it may be a manipulation of my patch entanglement. > the overrun. The fact this hang has been there since 2.6.35 tells > you how rare transactions overruns are.... > > FWIW, the fix for the hang is to make xlog_print_tic_res() return an > error and have the caller handle the shutdown. > > Cheers, > > Dave. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs