On Wed 13-11-13 05:59:11, Denys Fedoryshchenko wrote: > Hi > > On 2013-11-12 23:46, Jan Kara wrote: > >Hello, > > > >On Tue 12-11-13 16:34:07, Denys Fedoryshchenko wrote: > >>I just did some fault testing for test nbd setup, and found that if > >>i reboot nbd server i will get immediately BUG() message on nbd > >>client and filesystem that i cannot unmount, and any operations on > >>it will freeze and lock processes trying to access it. > > So how exactly did you do the fault testing? Because it seems > >something > >has discarded the block device under filesystem's toes and the > >superblock > >buffer_head got unmapped. Didn't something call NBD_CLEAR_SOCK ioctl? > >Because that calls kill_bdev() which would do exactly that... > > Client side: > modprobe nbd > nbd-client 2.2.2.29 /dev/nbd0 -name export1 > nbd-client 2.2.2.29 /dev/nbd1 -name export2 > nbd-client 2.2.2.29 /dev/nbd2 -name export3 > mount /dev/nbd0 /mnt/disk1 > mount /dev/nbd1 /mnt/disk2 > mount /dev/nbd2 /mnt/disk3 > > On server i have config: > [generic] > [export1] > exportname = /dev/sda1 > [export2] > exportname = /dev/sdb1 > [export3] > exportname = /dev/sdc1 > > Steps to reproduce: > 1)Start some large file copy on client side to /mnt/disk1/ > 2)Reboot server. It reboots quite fast, just few seconds, server > system will get ip before nbd-server process started listening, so > probably nbd-client will see connection refused. > 3)seems when client gets connection refused - it is going mad > > I can try to capture traffic dump, or do any other debug operation, > please let me know, what i should run :) > P.S. I noticed maybe i should run persist mode, but anyway it should > not crash like this i think. OK, no need for further debugging. I see what's going on. In NBD_DO_IT ioctl() nbd calls kill_bdev() after the kthread returned - and that happens in your case as we can see from "queue cleared" messages. Now there is a question how to fix this. Filesystems don't really expect device buffers to disappear under us as they do when nbd calls kill_bdev(). Also that never happens with normal block devices - if a similar situation happens to SCSI / SATA disk, corresponding block devices hang around refusing any IO until the filesystem is unmounted and at that point they disappear (device's refcount - bd_openers - reaches zero). It would be good if NBD behaved the same way - maybe we should return from NBD_DO_IT ioctl only after bd_openers drops to 1 (not zero because the nbd client has the device open as well for the ioctl if I'm right)? Honza --- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html