Re: disk corruption after virsh destroy

Stefan Hajnoczi <stefanha@xxxxxxxxx> · Wed, 3 Jul 2013 10:47:26 +0200

On Tue, Jul 02, 2013 at 10:40:11AM -0400, Brian J. Murrell wrote:
> I have a cluster of VMs setup with shared virtio-scsi disks.  The
> purpose of sharing a disk is that if a VM goes down, another can
> pick up and mount the (ext4) filesystem on shared disk a provide
> service to it.
> 
> But just to be super clear, only one VM ever has a filesystem
> mounted at a time even though multiple VMs technically can access
> the device at the same time.  A VM mounting a filesystem ensures
> absolutely that no other node has it mounted before mounting it.
> 
> That said, what I am finding is that when one a node dies and
> another node tries to mount the (ext4) filesystem, it is found dirty
> and needs an fsck.
> 
> My understanding is that with ext{3,4}, this should not be the case
> and indeed it is my experience, on real hardware with coherent disk
> caching (i.e. no non-battery-backed caching disk controllers lying
> to the O/S about what has been written to physical disk) that this
> is the case. That is, a node failing does not leave an ext{3,4}
> filesystem dirty such that it needs an fsck.
> 
> So, clearly, somewhere between the KVM VM and the physical disk,
> there is a cache that is resulting in the guest O/S believing data
> is being written to physical disk that is not actually being written
> there.  To that end, I have ensured that on these shared disks that
> I set "cache=none", but this does not seem to have fixed the
> problem.

I expect journal replay and possibly fsck when an ext4 file system was
left in a mounted state and with I/O pending (e.g. due to power
failure).

A few questions:

1. Is the guest mounting the file system with barrier=0?  barrier=1 is
   the default.

2. Do the physical disks have a volatile write cache enabled (if yes,
   the guest should use barrier=1)?  If the physical disks have a
   non-volatile write cache or the write cache is disabled (then
   barrier=0 is okay).

3. Have you tested without the cluster?  Run a single VM and kill it
   while it is busy.  Then start it up again and see if there is fsck.

4. Is it possible that your previous cluster setup used tune2fs(8) to
   disable fsck in some cases?  That could explain why you didn't see
   fsck before but do now.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html