Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 14 Sep 2012 10:15:32 +1000

On Thu, Sep 13, 2012 at 05:19:21PM +0900, Fernando Luis Vazquez Cao wrote:
> On 2012/09/13 16:18, Dave Chinner wrote:
> >On Thu, Sep 13, 2012 at 03:19:23PM +0900, Fernando Luis Vazquez Cao wrote:
> >>On 2012/07/16 07:45, Dave Chinner wrote:
> >>>On Fri, Jul 13, 2012 at 03:54:54PM +0200, Jan Kara wrote:
> >>>>On Thu 12-07-12 18:10:14, Fernando Luis Vázquez Cao wrote:
> >>>>>The FIISFROZEN ioctl can be use by HA and monitoring software to check
> >>>>>the freeze state of a mounted filesystem.
> >>>Can you explain in more detail why the HA system needs to check this?
> >>>And, for that matter, what it does with that information?
> >>What our HA guys told me is that certain fencing scripts
> >>try to umount filesystems that can be in frozen state. The
> >>problem is that when we umount a frozen filesystem the
> >>superblock still stays around which can lead to a split-brain
> >>scenario.
> >Then the bug is that unmounting a frozen filesystem is not
> >working correctly. Fix the problem, don't add new APIs to try to
> >detect a state where the bug might get tripped over and avoid it.
> 
> The problem is that we allow users to umount a frozen
> filesystem but we have neither  a bdev level fsfreeze
> API to thaw it after that nor check ioctls.

Yes, I know they don't exist, but you have to justify why they are
needed. So far you haven't.

> I proposed returning EBUSY when userspace tries to umount a
> frozen filesystem, but this would break lazy umount, which
> I was told by Al Viro and Josef Bacik is a no go, so I discarded
> this approach.
> 
> I will follow Al's advice a few years ago and do the following:
> 1- Let userspace umount frozen filesystems.

Which it already can.

> 2- Provide a bdev level ioctl to unfreeze a umounted filesystem.

The only time a block device knows about the frozen state of the
superblock is when dm-snapshot drives the freeze from the block
device. There are special hooks in, e.g. mount_bdev() for checking
whether there is an active bdev freeze and this prevents new mounts
from occurring while a bdev freeze is in progress. DM also
removes the freeze state itself, so this is not a persistent state.

IOWs, there are two specific freeze types - one a superblock (user)
level freeze, and the other is a block device (kernel) level freeze.
What you are proposing here means that the user, in certain
circumstances, needs to manipulate superblock level freezes from the
block level because they superblock is no longer visible to the
user.  It's a recipe for confusion and convoluted bugs, and it sure
as hell won't work for all filesystems. e.g. see 18e9e51 ("Introduce
freeze_super and thaw_super for the fsfreeze ioctl") as the reason
why superblock level freezes exist and why trying to thaw from the
bdev level doesn't work.

Indeed, what happens when the superblock freeze is driven from
dm-snapshot, and the user unmounts the fs and runs the blockdev
ioctl to drop the freeze reference that dm-snapshot holds?
That would free the superblock out from underneath DM, so this is
a can of worms I'd prefer not to open.

> I will also:
> 3-  add the check ioctls so that users can check whether a
> filesystem is frozen or not and avoid surprises.

Which only solves the problem for users that know they have to check
the state in certain corner cases. That doesn't fix the underlying
problem.

The reason this problem exists is that a active superblock level
freeze holds a reference to the superblock. This has interesting
side effects:

$ sudo xfs_freeze -f /mnt/scratch
$ sudo umount /mnt/scratch
$ sudo mount /dev/vdc /mnt/scratch
$ sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
<blocked on frozen state>
^Z
[1]+  Stopped                 sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
$ sudo xfs_freeze -u /mnt/scratch
$
$ fg
sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
wrote 65536/65536 bytes at offset 0
64 KiB, 16 ops; 0.0000 sec (inf EiB/sec and inf ops/sec)
$

The freeze persists over unmount/mount, but we write to the
filesystem during unmount, and run log recovery and write stuff to
the filesystem as part of the normal mount process. IOWs, no
filesystem checks for SB_UNFROZEN in either the unmount or mount
path and so we violate the freeze condition by writing to the block
device. This means that as it stands, an unmount or mount violates
the frozen state guarantee and unmount/mount effectively imply that
the filesystem is no longer frozen.  Hence silent snapshot image
corruption is very possible, and the fact that it is silent is very
worrying.

If you have a HA system that does user level freeze operations, then
you need a HA agent to handle the application that needs frozen
filesystems appropriately (i.e. tell it that whatever operation it
was doing has now failed and the filesystem is unfrozen, and, BTW,
shutdown so I can start you over there).  Trying to work around
weird freeze semantics at the filesystem unmount/blockdev level is
not going to make the application that froze the filesystem suddenly
avoid silent fs image corruption.

As it is, the requirement that we allow unmounting of frozen
filesystems implies that an unmount operation must also be a thaw
operation. The filesystem is no longer visible to the user, and they
have no method of controlling it's state anymore, so the user has
said to the kernel "it's all yours now". If the filesystem was
frozen when the unmount is issued, then the user is saying "I don't
care about the frozen state anymore". If they did care, then they
wouldn't be unmounting the filesystem without having first issued a
thaw operation.

FWIW, while we are unmounting the filesystem, we hold the s_umount lock
exclusively, so the filesystem cannot be thawed while an unmount is
in progress. Hence there is no reason why the unmount can't thaw the
superblock and take away it's reference as part of the unmount.

If we do this, all of the problems with persistent frozen superblock
state after unmount go away.  At this point, there is no need for
block device level ioctls to clean up a persistent frozen superblock
becuase they don't exist anymore. That means there is no need for
ioctls to check the frozen state of the filesystem before unmount,
either, because there is no problem to avoid....

> That said, even if the problem above did not exist it still would
> be nice to have check ioctls from an API point of view.

"Nice to have" is not a good enough reason to add new APIs. Fix the
underlying problems first, then we can discuss new APIs on their
merits.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html