On Thu 12-07-12 18:05:43, Fernando Luis Vázquez Cao wrote: > Changes from Dave Chinner's version: > - Remove s_frozen check in freeze_super which is not needed now that it is > re-entrant. > - Decrement freeze counter if the freeze_fs callback fails. > > --- > > thaw_bdev() has re-entrancy guards to allow freezes to nest > together. That is, it ensures that the filesystem is not thawed > until the last thaw command is issued. This is needed to prevent the > filesystem from being unfrozen while an existing freezer is still > operating on the filesystem in a frozen state (e.g. dm-snapshot). > > Currently, freeze_super() and thaw_super() bypasses these guards, > and as a result manual freezing and unfreezing via the ioctl methods > do not nest correctly. hence mixing userspace directed freezes with > block device level freezes result in inconsistency due to premature > thawing of the filesystem. > > Move the re-enterency guards to the superblock and into freeze_super > and thaw_super() so that userspace directed freezes nest correctly > again. Caveat: Things work as expected as long as direct calls to > thaw_super are always in response to a previous sb level freeze. In > other words an unpaired call to thaw_super can still thaw a > filesystem frozen using freeze_bdev (this issue could be addressed > in a follow-up patch if deemed necessary). > > This patch retains the bdev level mutex and counter to keep the > "feature" that we can freeze a block device that does not have a > filesystem mounted yet. > > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> > Cc: Jan Kara <jack@xxxxxxx> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > Signed-off-by: Fernando Luis Vazquez Cao <fernando@xxxxxxxxxxxxx> > --- Some mostly minor coments below. > diff -urNp vfs-orig/fs/block_dev.c vfs/fs/block_dev.c > --- vfs-orig/fs/block_dev.c 2012-07-12 14:31:38.936631141 +0900 > +++ vfs/fs/block_dev.c 2012-07-12 15:03:57.032627014 +0900 > @@ -257,16 +257,18 @@ int fsync_bdev(struct block_device *bdev > EXPORT_SYMBOL(fsync_bdev); > > /** > - * freeze_bdev -- lock a filesystem and force it into a consistent state > + * freeze_bdev -- lock a block device > * @bdev: blockdevice to lock > * > - * If a superblock is found on this device, we take the s_umount semaphore > - * on it to make sure nobody unmounts until the snapshot creation is done. > - * The reference counter (bd_fsfreeze_count) guarantees that only the last > - * unfreeze process can unfreeze the frozen filesystem actually when multiple > - * freeze requests arrive simultaneously. It counts up in freeze_bdev() and > - * count down in thaw_bdev(). When it becomes 0, thaw_bdev() will unfreeze > - * actually. > + * Locks the block device and, if present, the associated filesystem too. > + * > + * The reference counter (bd_fsfreeze_count) is used to implement the feature > + * that allows one to freeze a block device that does not have a filesystem > + * mounted yet. For filesystems using mount_bdev the kernel takes care of > + * things by preventing the mount operation from succeeding if the underlying > + * block device is frozen. Other filesystems should check this counter or risk > + * a situation where a freeze_bdev user (e.g. dm snapshot) and mount race, > + * which may lead to inconsistencies. > */ > struct super_block *freeze_bdev(struct block_device *bdev) > { > @@ -274,17 +276,7 @@ struct super_block *freeze_bdev(struct b > int error = 0; > > mutex_lock(&bdev->bd_fsfreeze_mutex); > - if (++bdev->bd_fsfreeze_count > 1) { > - /* > - * We don't even need to grab a reference - the first call > - * to freeze_bdev grab an active reference and only the last > - * thaw_bdev drops it. > - */ > - sb = get_super(bdev); > - drop_super(sb); > - mutex_unlock(&bdev->bd_fsfreeze_mutex); > - return sb; > - } > + bdev->bd_fsfreeze_count++; > > sb = get_active_super(bdev); > if (!sb) > @@ -297,30 +289,33 @@ struct super_block *freeze_bdev(struct b > return ERR_PTR(error); > } > deactivate_super(sb); > - out: > +out: > sync_blockdev(bdev); > mutex_unlock(&bdev->bd_fsfreeze_mutex); > - return sb; /* thaw_bdev releases s->s_umount */ > + return sb; > } > EXPORT_SYMBOL(freeze_bdev); > > /** > - * __thaw_bdev -- unlock filesystem > + * __thaw_bdev -- unlock a block device > * @bdev: blockdevice to unlock > * @sb: associated superblock > * @emergency: emergency thaw > * > - * Unlocks the filesystem and marks it writeable again after freeze_bdev(). > + * Unlocks the block device and, if present, the associated filesystem too. > */ > static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency) > { > int error = -EINVAL; > > mutex_lock(&bdev->bd_fsfreeze_mutex); > + > if (!bdev->bd_fsfreeze_count) > goto out; > > - if (--bdev->bd_fsfreeze_count > 0 || !sb) { > + bdev->bd_fsfreeze_count--; > + > + if (!sb) { > error = 0; > goto out; > } > @@ -336,13 +331,6 @@ out: > return error; > } > > -/** > - * thaw_bdev -- unlock filesystem > - * @bdev: blockdevice to unlock > - * @sb: associated superblock > - * > - * Unlocks the filesystem and marks it writeable again after freeze_bdev(). > - */ > int thaw_bdev(struct block_device *bdev, struct super_block *sb) > { > return __thaw_bdev(bdev, sb, 0); It's a bit confusing (for reviewer) to remove the documentation here when you remove the whole function in a later patch... ... > @@ -1233,29 +1240,33 @@ static int __thaw_super(struct super_blo > { > int error = 0; > > - if (sb->s_frozen == SB_UNFROZEN) { > + mutex_lock(&sb->s_freeze_mutex); > + if (!sb->s_freeze_count) { > error = -EINVAL; > - goto out; > + goto out_unlock; > } > + sb->s_freeze_count = emergency ? 1 : sb->s_freeze_count; It would be cleaner to do this somewhere in do_thaw_one() and not here. Also you won't have to pass the emergency parameter then... Also it might be more logical for __thaw_super() to expect also s_freeze_mutex locked and handle the locking in thaw_super(). Honza -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html