The emergency thaw process uses iterate_super() which holds the sb->s_umount lock in read mode. The current thaw_super() code takes the sb->s_umount lock in write mode, hence leading to an instant deadlock. Use the unlocked version of thaw_super() to do the thawing and replace iterate_supers() with iterate_supers_write() so that the unfreeze operation can be performed with s_umount held as the locking rules for fsfreeze indicate. As a bonus, by using thaw_super(), which does not nest, instead of thaw_bdev() when can get rid of the ugly while loop. Jan Kara pointed out that with this approach we will leave the block devices frozen, but this is a problem we have had since the introduction of the superblock level API: if we thaw the filesystem using the superblock level API (be it through the thaw ioctl or emergency thaw) the bdev level freeze reference counter (bd_fsfreeze_count) will not be updated and even though subsequent calls to thaw_bdev() will decrease it it will never get back to 0 (if thaw_super() returns an error, and it will when the superblock is unfrozen, thaw_bdev() will return without decreasing the counter). The solution I propose (and will be implementing in the followup patch "fsfreeze: freeze_super and thaw_bdev don't play well together") is letting bd_fsfreeze_count become zero when the superblock sitting on top of it is unfrozen, so that future calls to freeze_bdev() actually try to freeze the superblock. Cc: linux-fsdevel@xxxxxxxxxxxxxxx Cc: Josef Bacik <jbacik@xxxxxxxxxxxx> Cc: Eric Sandeen <sandeen@xxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> Cc: Jan Kara <jack@xxxxxxx> Cc: Dave Chinner <dchinner@xxxxxxxxxx> Cc: Luiz Capitulino <lcapitulino@xxxxxxxxxx> Signed-off-by: Fernando Luis Vazquez Cao <fernando@xxxxxxxxxxxxx> --- diff -urNp linux-3.8-rc1-orig/drivers/tty/sysrq.c linux-3.8-rc1/drivers/tty/sysrq.c --- linux-3.8-rc1-orig/drivers/tty/sysrq.c 2012-12-25 10:27:40.614737000 +0900 +++ linux-3.8-rc1/drivers/tty/sysrq.c 2012-12-25 11:40:06.128018000 +0900 @@ -363,7 +363,6 @@ static struct sysrq_key_op sysrq_moom_op .enable_mask = SYSRQ_ENABLE_SIGNAL, }; -#ifdef CONFIG_BLOCK static void sysrq_handle_thaw(int key) { emergency_thaw_all(); @@ -374,7 +373,6 @@ static struct sysrq_key_op sysrq_thaw_op .action_msg = "Emergency Thaw of all frozen filesystems", .enable_mask = SYSRQ_ENABLE_SIGNAL, }; -#endif static void sysrq_handle_kill(int key) { diff -urNp linux-3.8-rc1-orig/fs/buffer.c linux-3.8-rc1/fs/buffer.c --- linux-3.8-rc1-orig/fs/buffer.c 2012-12-25 11:30:38.208018000 +0900 +++ linux-3.8-rc1/fs/buffer.c 2012-12-25 11:40:06.128018000 +0900 @@ -512,15 +512,33 @@ repeat: static void do_thaw_one(struct super_block *sb, void *unused) { - char b[BDEVNAME_SIZE]; - while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb)) - printk(KERN_WARNING "Emergency Thaw on %s\n", + int res; + + if (sb->s_bdev) { + char b[BDEVNAME_SIZE]; + printk(KERN_WARNING "Emergency Thaw on %s.\n", bdevname(sb->s_bdev, b)); + } + + /* + * We got here from __iterate_supers with the superblock lock taken + * so we can call the lockless version of thaw_super() safely. + */ + res = __thaw_super(sb); + if (!res) { + deactivate_locked_super(sb); + /* + * We have to re-acquire s_umount because + * iterate_supers_write() will unlock it. It still holds + * passive reference so sb cannot be freed under us. + */ + down_write(&sb->s_umount); + } } static void do_thaw_all(struct work_struct *work) { - iterate_supers_read(do_thaw_one, NULL); + iterate_supers_write(do_thaw_one, NULL); kfree(work); printk(KERN_WARNING "Emergency Thaw complete\n"); } diff -urNp linux-3.8-rc1-orig/include/linux/fs.h linux-3.8-rc1/include/linux/fs.h --- linux-3.8-rc1-orig/include/linux/fs.h 2012-12-25 11:35:55.488018000 +0900 +++ linux-3.8-rc1/include/linux/fs.h 2012-12-25 11:40:06.132018000 +0900 @@ -1881,6 +1881,7 @@ extern int vfs_ustat(dev_t, struct kstat extern int freeze_super(struct super_block *super); extern int __thaw_super(struct super_block *super); extern int thaw_super(struct super_block *super); +extern void emergency_thaw_all(void); extern bool our_mnt(struct vfsmount *mnt); extern int current_umask(void); @@ -2053,7 +2054,6 @@ extern void iterate_bdevs(void (*)(struc extern int sync_blockdev(struct block_device *bdev); extern void kill_bdev(struct block_device *); extern struct super_block *freeze_bdev(struct block_device *); -extern void emergency_thaw_all(void); extern int thaw_bdev(struct block_device *bdev, struct super_block *sb); extern int fsync_bdev(struct block_device *); #else -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html