On Thu, Dec 30, 2010 at 05:58:36PM +0530, Ajeet Yadav wrote: > Kernel: 2.6.30.9 > > I am trouble shooting a hang in XFS during umount. > Test scenerio: Copy large number of files files using below script, and > remove the USB after 3-5 second FWIW, in future can you please report what kernel you are testing on? > > index=0 > while [ "$?" == 0 ] > do > index=$((index+1)) > sync > cp $1/1KB.txt $2/"$index".test > done > > In rare scenerio during USB unplug the umount process hang at xfs_buf_lock. > Below log shows the hung process > > We have put printk to buffer handling functions xfs_buf_iodone_callbacks(), > xfs_buf_error_relse(), xfs_buf_relse() and xfs_buf_rele() > > We always observed the hang only comes when bp->b_relse = > xfs_buf_error_relse(). i.e when xfs_buf_iodone_callbacks() execute > XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse); > XFS_BUF_DONE(bp); > XFS_BUF_FINISH_IOWAIT(bp); > > buf its never called by xfs_buf_relse() because b_hold = 3. > > Also we have seen that this problem always comes when bp->relse != NULL && > bp->hold > 1. This appears to be the same problem as reported here: http://oss.sgi.com/archives/xfs/2010-12/msg00380.html > I do not know whether below prints will help you, but I have taken printk > for super block buffer tracing > S-functionname ( Start of function) > E-functionname (End of function) If you have a recent enough kernel, you can get all this information from the tracing built into XFS. As it is, the cause of the problem is that setting bp->b_relse changes the behaviour of xfs_buf_relse() - if bp->b_relse is set, it doesn't unlock the buffer. This is normally just fine, because xfs_buf_rele() has a special case to handle buffers with bp->b_relse(), which adds a hold count and call the release function when the hold count drops to zero. The b_relse function is supposed to unlock the buffer by calling xfs_buf_relse() again. Unfortunately, the superblock buffer is special - the hold count on it never drops to zero until very late in the unmont process because it is managed by the filesystem. Hence the bp->b_relse function is never called, and hence the buffer is never unlocked in this case. Hence future attempts to access it hang. I'll need to think about this one for a bit... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs