Re: writeback cache + h700 controller w/1gb nvcache, corruption on power loss

Stefan Hajnoczi <stefanha@xxxxxxxxx> · Tue, 17 Apr 2012 09:41:50 +0100

On Mon, Apr 16, 2012 at 9:51 AM, Ron Edison <ron@xxxxxxxxx> wrote:
> I would be very interested in how to ensure the guests are sending flushes. I'm unfamiliar with the example you gave, where is that configured?

"mount -o barrier=1 /dev/sda /mnt" is a mount option for ext3 and ext4
file systems.

You probably don't want this actually since you have a battery-backed
RAID controller.  See below for more.

> Primarily the guests are CentOS 4, 5 or 6. I am also curious if it would be advisable to switch to writethrough cache on each guest virtual disk and leave writeback enabled on the controller and if that would adversely affect performance of the guests.

The most conservative modes are cache=writethrough (uses host page
cache) and cache=directsync (does not use host page cache).  They both
ensure that every single write is flushed to disk.  Therefore they
have a performance penalty.  cache=directsync minimizes stress on host
memory because it bypasses the page cache.

Since you have a non-volatile cache in your RAID controller you can
also use cache=none.  This also bypasses the host page cache but it
does not flush every single write.  The guest may still send flushes
but even if it does not, the writes are going to the RAID controller's
non-volatile cache.

> The disk corruption experienced was indeed lost data -- an fsck was necessary for 4 of the guests to boot at all in RW mode, they first came up read only. In the case of one of the guests there was actually files data / data lost after fsck was manually run upon reboot/single user mode. In some cases these were config files, in other database indexes, etc. This one of the 4 guests with the most severe corruption was not usable and we had to revert to a backup and pull current data out of it as much as possible.

Since you used QEMU -drive cache=writeback data loss is expected on
host power failure.  cache=writeback uses the (volatile) host page
cache and therefore data may not have made it to the RAID controller
before power was lost.

Guest file system recovery - either a quick journal replay or a
painful fsck - is also expected on host power failure.  The file
systems are dirty since the guest stopped executing without cleanly
unmounting its file systems.  If you use cache=none or
cache=directsync then you should get a quick journal replay and the
risk of a painful fsck should be reduced (most/all of the data will
have been preserved).

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html