On Fri, Jul 01, 2011 at 10:00:54AM +0530, Amit Sahrawat wrote: > On Thu, Jun 30, 2011 at 5:49 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Thu, Jun 30, 2011 at 04:57:42PM +0530, Amit Sahrawat wrote: > > > Hi All, > > > I encountered a hang on XFS during unplug. > > > *Test Case:* > > > #!/bin/sh > > > index=0 > > > while [ "$?" == 0 ] > > > do > > > index=$(($index+1)) > > > sync > > > cp /mnt/1KB.txt /tmp/"$index".test > > > done > > > Where /mnt - mount point for vfat and /tmp mount point for XFS, both can be > > > XFS also. > > > > > > During this operation, unplug the USB. I am getting HANG almost everytime I > > > unplug. > > > > Well, that's no surprise. The unplug appears to be losing IOs in > > progress. > > > > > *Kernel Version:* 2.6.35.13 (extremely sorry, I know next question will be > > > why am I not using TOT kernel - I tried but my PC does not boot up with the > > > latest one) ..... > > > *INFO: task khubd:*33 blocked for more than 120 seconds. > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > khubd D c06c261c 0 33 2 0x00000000 > > > Backtrace: > > > [<c06c2210>] (schedule+0x0/0x500) from [<c0523f4c>] > > > (_xfs_log_force+0x230/0x284) > > > > You need to turn off line wrapping for stuff you paste into email. > > The cleaned up (i.e. relevant part) trace is: > > > > [<c06c2210>] (schedule+0x0/0x500) > > [<c0523d1c>] (_xfs_log_force+0x0/0x284) > > [<c052417c>] (xfs_log_force+0x0/0x38) > > [<c0544e94>] (xfs_sync_data+0x0/0x58) > > [<c0544f20>] (xfs_quiesce_data+0x0/0x80) > > [<c05421e4>] (xfs_fs_sync_fs+0x0/0xe0) > > [<c048fa74>] (__sync_filesystem+0x0/0xa0) > > [<c048fb88>] (sync_filesystem+0x0/0x60) > > [<c0499104>] (fsync_bdev+0x0/0x44) > > [<c056c680>] (invalidate_partition+0x0/0x3c) > > [<c04b88e0>] (del_gendisk+0x0/0x140) > > [<c05c78a0>] (sd_remove+0x0/0x84) > > [<c05b27f4>] (__device_release_driver+0x0/0xac) > > [<c05b2954>] (device_release_driver+0x0/0x30) > > [<c05b1ddc>] (bus_remove_device+0x0/0x8c) > > [<c05b02d8>] (device_del+0x0/0x170) > > [<c05c4d5c>] (__scsi_remove_device+0x0/0x90) > > [<c05c23bc>] (scsi_forget_host+0x0/0x6c) > > [<c05bc38c>] (scsi_remove_host+0x0/0x104) > > [<c0612f94>] (quiesce_and_remove_host+0x0/0x9c) > > [<c06130b4>] (usb_stor_disconnect+0x0/0x28) > > [<c0601614>] (usb_unbind_interface+0x0/0xdc) > > [<c05b27f4>] (__device_release_driver+0x0/0xac) > > [<c05b2954>] (device_release_driver+0x0/0x30) > > [<c05b1ddc>] (bus_remove_device+0x0/0x8c) > > [<c05b02d8>] (device_del+0x0/0x170) > > [<c05ff06c>] (usb_disable_device+0x0/0xf8) > > [<c05fa8e0>] (usb_disconnect+0x0/0xf4) > > [<c05fabd8>] (hub_thread+0x0/0xd78) > > [<c041e61c>] (kthread+0x0/0x8c) > > > > Well, that just looks utterly braindamaged to me. > > > > We just had the device containing the filesystem removed from the > > system, so the error handling routine ends up trying to sync the > > filesystem to the device that doesn't exist anymore. WTF? > > > > >>> This is what I think, why is syncing taking place when the Amit, you don't need to quote your own reply. That just confuses mail readers that understand the ">" quoting convention and highlight appropriately, and made me wonder if you'd even replied.... > This is what I think, why is syncing taking place when the > device doesn't exist anymore. What is the gain in doing so? I doubt the person who wrote the error handling even realised that it ended up in such a mess. > I > will try and propose this feature. Not sure what you mean by this.... .... > > AFAICT, this problem doesn't exist in TOT - the conversion of the > > Again I have a problem which seems fixed in TOT :) > > > xfslogd workqueue to CMWQ allows processing of other xfslogd > > workqueue events to continue even though this one has gone to sleep. > > > > You probably need to change the shutdown type to > > SHUTDOWN_LOG_IO_ERROR to prevent a log flush from occurring in this > > shutdown context. > > This will fix the error for this kernel version, I will give this a try. > Is this the patchwork for CMWQ: > http://patchwork.xfs.org/patch/2037/ (xfs: improve sync behaviour > in face of aggressive dirtying) ? Please let me know. No. 2.6.35 doesn't have the CMWQ infrastructure, it was introduced in 2.6.38 IIRC. IOWs, there isn't a fix you can just backport - you're going to need to write and test your own fix, and my suggestion for doing that is above. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs