On Fri, Sep 14, 2012 at 10:46:33AM +0900, Fernando Luis Vazquez Cao wrote: > On 2012/09/14 09:15, Dave Chinner wrote: > >On Thu, Sep 13, 2012 at 05:19:21PM +0900, Fernando Luis Vazquez Cao wrote: > >>The problem is that we allow users to umount a frozen > >>filesystem but we have neither a bdev level fsfreeze > >>API to thaw it after that nor check ioctls. > >>2- Provide a bdev level ioctl to unfreeze a umounted filesystem. > >The only time a block device knows about the frozen state of the > >superblock is when dm-snapshot drives the freeze from the block > >device. There are special hooks in, e.g. mount_bdev() for checking > >whether there is an active bdev freeze and this prevents new mounts > >from occurring while a bdev freeze is in progress. > > Yes, and I tried to retain that feature in my patch set. > > >DM also removes the freeze state itself, so this is not a persistent state. > > Unfortunately freeze_bdev and thaw bdev are symbols > exported to modules. I don't see how that is relevant. > >IOWs, there are two specific freeze types - one a superblock (user) > >level freeze, and the other is a block device (kernel) level freeze. > >What you are proposing here means that the user, in certain > >circumstances, needs to manipulate superblock level freezes from the > >block level because they superblock is no longer visible to the > >user. It's a recipe for confusion and convoluted bugs, and it sure > >as hell won't work for all filesystems. e.g. see 18e9e51 ("Introduce > >freeze_super and thaw_super for the fsfreeze ioctl") as the reason > >why superblock level freezes exist and why trying to thaw from the > >bdev level doesn't work. > > I discussed the issue mentioned in that commit with Josef > Bacik and Chris Mason and decided to fix those filesystems > for which it would not work (we need to fix the btrfs issue > anyway). Fixing btrfs doesn't address the layering problem. > >Indeed, what happens when the superblock freeze is driven from > >dm-snapshot, and the user unmounts the fs and runs the blockdev > >ioctl to drop the freeze reference that dm-snapshot holds? > >That would free the superblock out from underneath DM, so this is > >a can of worms I'd prefer not to open. > > The prototype I wrote for the blockdev ioctl > grabs the bdev->bd_fsfreeze_mutex and checks > bdev->bd_fsfreeze_count. If bd_fsfreeze_count>1 > it returns EBUSY, else it calls freeze_super so > that the thawed and subsequently released. So if I run FIFREEZE, then unmount, bdev->bd_fsfreeze_count is zero so BLKTHAW won't thaw the superblock. If you decide to ignore the bd_fsfreeze_count and thaw_super(), then you don't have any nesting sanity at all. > I hope this addresses your concerns. That said > I am not married to this aproach, if we decide > that filesystems should be thawed on > umount (which I was told is a no-go in a previous > LSF) I am willing to implement it. I don't recall it ever being discussed. If filesystems are supposed to maintain frozen state during mount and unmount (which is what this implies), then we've got a lot of work to do in every single filesystem to audit and ensure that the mount/unmount paths never write anything to disk... It's far simpler, and IMO much saner, simply to saw unmount == thaw. > >>I will also: > >>3- add the check ioctls so that users can check whether a > >>filesystem is frozen or not and avoid surprises. > >Which only solves the problem for users that know they have to check > >the state in certain corner cases. That doesn't fix the underlying > >problem. > > > >The reason this problem exists is that a active superblock level > >freeze holds a reference to the superblock. This has interesting > >side effects: > > > >$ sudo xfs_freeze -f /mnt/scratch > >$ sudo umount /mnt/scratch > >$ sudo mount /dev/vdc /mnt/scratch > >$ sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo > ><blocked on frozen state> > >^Z > >[1]+ Stopped sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo > >$ sudo xfs_freeze -u /mnt/scratch > >$ > >$ fg > >sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo > >wrote 65536/65536 bytes at offset 0 > >64 KiB, 16 ops; 0.0000 sec (inf EiB/sec and inf ops/sec) > >$ > > > >The freeze persists over unmount/mount, but we write to the > >filesystem during unmount, and run log recovery and write stuff to > >the filesystem as part of the normal mount process. IOWs, no > >filesystem checks for SB_UNFROZEN in either the unmount or mount > >path and so we violate the freeze condition by writing to the block > >device. This means that as it stands, an unmount or mount violates > >the frozen state guarantee and unmount/mount effectively imply that > >the filesystem is no longer frozen. Hence silent snapshot image > >corruption is very possible, and the fact that it is silent is very > >worrying. > > That is why my original proposal was to return EBUSY > when userspace tries to umount a frozen filesystem. Which was never going to fly, even though it makes sense. > However this breaks lazy umount and I was told that > auto-thaw on umount is not acceptable, so I came > with the current patch set. Silently corrupted snapshots due to unmount/mount when frozen is pretty bad, and there's no way around it right now. FWIW: "I was told...": A reference to the discussion so I can find out the reasons why it is unacceptible would be handy.... > >If we do this, all of the problems with persistent frozen superblock > >state after unmount go away. At this point, there is no need for > >block device level ioctls to clean up a persistent frozen superblock > >becuase they don't exist anymore. That means there is no need for > >ioctls to check the frozen state of the filesystem before unmount, > >either, because there is no problem to avoid.... > > Check ioctls are still needed IMHO. For example, there > is one issue we run into quite often in virtualization > environments: > > We have Linux guest OS where fsfreeze in performed by a > guest agent controlled by the hypervisor. At the behest of > the hypervisor the agent freezes a guest filesystem and, for > whatever reason (a bug for example), the agent dies after > that. > The filesystem was frozen without the guest's users > being aware of it, so these users will find that writes to > the filesystem block and they have no way to know > what is going on because we do not have check ioctls. They should know what happened - the failure of the management operation or agent should have been reported to the administrator. It doesn't matter whether the filesystem is frozen or not - any failure of a automated management operation that coul daffect the operationof the guest VM shoul dbe raising alarms. At that point, the owner ofthe guest will know what failed before they even start digging around in the guest VM and have some idea of what to look for and what to resolve. I'd be looking for a new VM infrastructure provider if the first I knew about failed infrastructure management operations was customers complaining about their VMs hanging.... > In such cases I was told by costumers that they would > like to use either emergency thaw (which this patch set > attempts to fix) or a check ioctl. If they can run a check ioctl, then they don't need an emergency thaw - they can just run fsfreeze -u. Further, what those users want is to know if the filesystem is frozen, they don't *want* an ioctl. For them, something in /proc/mount (e.g. add a "frozen" state to the mount options output) or /sys/block/<dev>/freeze_count for block level freezes. The information is easy to obtain and only requires cat to access. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html