Hi Peter, I'm the current maintainer of Quorum in QEMU and I'd like to try to answer some of your comments. On Fri, Jan 08, 2016 at 06:20:04PM +0100, Peter Krempa wrote: > So I have a few comments/observations regarding the quorum block > driver in qemu and it's usability. > > At first I'd like to as you to describe your use case a bit > more. I'm currently lacking the motivation to do anything about > this, as the series is just partial and I don't really see any > advantage of using the qorum driver at all and can't come up with > any useful use case. > > Also a good use case is usually a good reason to drive development > of a feature and I'm afraid that this could become abandoned without > any real use. The original use case for which Quorum was designed was a data center doing redundancy with storage in multiple separate rooms shared using NFS. One of the issues that the customer was facing was not only problems in the file servers themselves but -mainly- data corruption accross the network. Quorum can correct this on the fly and is able to identify which one of the file servers is causing the problem without having to rebuild a whole array (like it would be the case with RAID). Quorum is also used for the COLO block replication functionality currently being discussed in QEMU: http://wiki.qemu.org/Features/BlockReplication > 1) No traking of integrity > As the quorum members don't have headers, failed quorum members > are not recorded and remembered. The user or management app then > has to do this externally for given storage devices. > > 2) No internal tracking of quorum members > Members of the quorum don't have any header marking them > as such and thus any images may be mixed together with > unforseen/catastrophic results. Higher level management then > needs to take the role of remembering which images belong > together. Reimplementing this looks like reimplementing a > distriuted storage system to me. That's right, Quorum does not have its own file format and was designed to work with any driver or protocol that QEMU supports, so I'm not sure if there's much that can be done about this. > 3) Lack of auto-resync: > Once the quorum get's few inconsistencies it does not > automatically resync like the linux MD driver. With the current > implementation the only way to resync this would be to issue a > block-mirror (blockCopy) to /dev/null so that all blocks are > read and rewritten to the identical copy. This also requires a > user action. > > Additionally the member of the quorum is not ignored if it was > out of sync in any previous time without being resynced allowing > for split-brain/corruption scenarios. Quorum can fix errors on the fly (there's the 'rewrite-corrupted' flag for that), so in those cases no manual intervention is required. If we want a way to auto-resync a complete image that should be doable, I believe it's relatively simple to implement in QEMU (depending on the semantics). For the manual resync I also agree that it would be good to have a simple API to do that in case the user wants to do it manually. That can be done. > 4) Necessity for at least 3 copies > Since a majority needs to win in a vote, you need at least 3 > member disks for this to be fault-tolerant. > > 5) Lack of speedup > Since always all blocks are read from all members and verified > the quorum backend doesn't really add any speed to the > reads. This can be mostly attributed to the fact that fault > tracking is not present. > > In other cases, due to internal error correcting codes it's very > unlikely that a storage medium would return a corrupted sector > without producing a error. 4) and 5) are part of the design of Quorum, as I said one the goals is to detect (and correct) silent data corruption on the fly, not to speed up disk access or to be space efficient. > 6) Almost every remote storage technology does quorums internally > Any distributed storage (ceph/rbd, gluster, sheepdog, etc..) > provide the quorum functionality internally with added benefit > that their internal working fixes problems when split of the > network occurs. > > 7) Tools are restricted to qemu and qemu-img > It's a "proprietary" implementation so for a rebuild you have > to use one of the two tools. AFAIK qemu-img is not really > user friendly for the less common disk backends and we don't > really provide any abstraction on top of that. This means > that there really aren't any reasonable tools to do a offline > resync. (Okay, if you know which instance is okay, you can just > copy it ...) Right. If this is important I can propose to write a tool for QEMU to deal with this. It's probably a good idea anyway. > This series also lacks implementation of any user/maganement > warning method that a block operation didn't have 100% votes in the > quorum voting thus it's not really possible for the users to do a > rebuild/diagnostic if something fails. I can't say much about this series because I haven't looked into the code in detail yet, but I'm willing to help fix the existing problems, add the missing features and improve the code (both in libvirt and QEMU) if there are no other major blockers. Thanks, Berto -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list