On 2012/10/08 22:57, Jan Kara wrote:
On Fri 05-10-12 14:35:53, Fernando Luis Vázquez Cao wrote:
The emergency thaw process uses iterate_super() which holds the
sb->s_umount lock in read mode. The current thaw_super() code takes
the sb->s_umount lock in write mode, hence leading to an instant
deadlock.
Use the unlocked version of thaw_super() to do the thawing and replace
iterate_supers() with __iterate_supers() so that the unfreeze operation can
^^ iterate_supers_write()
Good catch.
be performed with s_umount held as the locking rules for fsfreeze indicate.
As a bonus, by using thaw_super(), which does not nest, instead of thaw_bdev()
when can get rid of the ugly while loop.
Jan Kara pointed out that with this approach we will leave the block devices
frozen, but this is a problem we have had since the introduction of the
superblock level API: if we thaw the filesystem using the superblock level API
(be it through the thaw ioctl or emergency thaw) the bdev level freeze
reference counter (bd_fsfreeze_count) will not be updated and even though
subsequent calls to thaw_bdev() will decrease it it will never get back to 0
(if thaw_super() returns an error, and it will when the superblock is unfrozen,
thaw_bdev() will return without decreasing the counter). The solution I propose
(and will be implementing in the followup patch "fsfreeze: freeze_super and
thaw_bdev don't play well together") is letting bd_fsfreeze_count
become zero when the superblock sitting on top of it is unfrozen, so that
future calls to freeze_bdev() actually try to freeze the superblock.
Cc: Josef Bacik <jbacik@xxxxxxxxxxxx>
Cc: Eric Sandeen <sandeen@xxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Cc: Jan Kara <jack@xxxxxxx>
Cc: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@xxxxxxxxxxxxx>
---
diff -urNp linux-3.6.0-rc7-orig/fs/buffer.c linux-3.6.0-rc7/fs/buffer.c
--- linux-3.6.0-rc7-orig/fs/buffer.c 2012-09-26 13:20:14.842365056 +0900
+++ linux-3.6.0-rc7/fs/buffer.c 2012-09-26 15:02:22.630595704 +0900
@@ -513,15 +513,28 @@ repeat:
static void do_thaw_one(struct super_block *sb, void *unused)
{
- char b[BDEVNAME_SIZE];
- while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
- printk(KERN_WARNING "Emergency Thaw on %s\n",
+ int res;
+
+ if (sb->s_bdev) {
+ char b[BDEVNAME_SIZE];
+ printk(KERN_WARNING "Emergency Thaw on %s.\n",
bdevname(sb->s_bdev, b));
+ }
+
+ /* We got here from __iterate_supers with the superblock lock taken
+ * so we can call the lockless version of thaw_super() safely. */
+ res = __thaw_super(sb);
+ /* If we are going to drop the final active reference call
+ * deactivate_locked_super to clean things up. In the general case
+ * we avoid calling deactivate_locked_super() because it would relase
+ * the superblock lock, which is __iterate_supers()'s job. */
+ if (!res && !atomic_add_unless(&sb->s_active, -1, 1))
+ deactivate_locked_super(sb);
This just looks wrong. When we *do* end up calling
deactivate_locked_super() we will return with sb unlocked which makes
iterate_supers_write() unlock already unlocked lock.
Thank you for the heads-up.
I missed the fact that ->kill_sb() which gets called in
deactivate_locked_super()
will unlock the superblock indirectly via generic_shutdown_super() or one of
the wrappers around it (kill_block_super(), kill_anon_super(),
kill_litter_super()).
What I would put here is:
if (!res) {
deactivate_locked_super(sb);
/*
* We have to re-acquire s_umount because
* iterate_supers_write() will unlock it. It still holds
* passive reference so sb cannot be freed under us.
*/
down_write(&sb->s_umount);
}
Is there any problem with this I miss?
The reason I wrote the code as I did is that I did not want to re-acquire
s_umount in the normal case (s_active >= 2 entering the if statement).
What about combining our approaches and doing something like this?:
if (!res && !atomic_add_unless(&sb->s_active, -1, 1)) {
deactivate_locked_super(sb);
down_write(&sb->s_umount);
}
Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html