Am 11.07.2011 11:41, schrieb Pekka Enberg: > Hi Kevin, > > On Mon, Jul 11, 2011 at 12:31 PM, Kevin Wolf <kwolf@xxxxxxxxxx> wrote: >> I would love to try out your code occasionally myself, but so far I have >> been to lazy to build a guest kernel only to be able to test it. Having >> to deal with the huge kernel git tree just for a small program doesn't >> really make it more fun either... Anyway, what I'm trying to say is that >> everything in my mails is from a purely theoretical POV. I have only >> looked at the code, but never really tried it. > > Most distro kernels boot just fine, AFAIK. If you have a kernel tree > laying around, you can use > > git remote add kvm-tool git://github.com/penberg/linux-kvm.git > git remote update kvm-tool > > to fetch the sources. Yeah, I do have the source and I read some parts of it. Just running it didn't seem to work with the standard Fedora kernel last time. Seems to work now, so it was probably my fault. Not sure what I did different last time, maybe I relied on it to pick up kernel and initrd automatically from the host (it finds the kernel, but not the initrd). >> As Ingo already said, the cache mode is probably the major difference. >> From what I can see in your code, cache=writeback would be the >> equivalent for what tools/kvm is doing, however cache=none (i.e. >> O_DIRECT) is what people usually do with qemu. > > Yup, I posted 'cache=writeback' results too which are much closer to > tools/kvm numbers. Saw it. cache=none would probably help with the stability, but of course you would also have to add O_DIRECT to tools/kvm to make it fair. >> And then there seems to be another big difference. I hope I'm not >> missing anything, but you seem to be completely lacking refcount >> handling for qcow2. This is okay for read-only image, but with write >> access to the image, you're corrupting the images if you don't update >> the refcounts. Have you checked qcow2 images with qemu-img check after >> tools/kvm having written to it? >> >> Maintaining the right order between L2 writes and refcount block writes >> is another source of flushes in qemu, which of course makes a difference >> for performance. > > Yes, you're absolutely correct. We don't support copy-on-write images > and I didn't realize until yesterday evening that we don't even check > the 'copied' bit to make sure writes are safe. > > However, for these particular numbers, it doesn't matter that much > because it's all append-only and thus shouldn't trigger any of the > copy-on-write paths. It has nothing to do with copy on write. Well, of course COW is the reason why the refcounts exist at all, but for a correct qcow2 image they must be consistent even when you don't do COW. The problem is that when you run an image, in which tools/kvm has allocated new clusters, in qemu, it will use the refcount table and still see the clusters as free. So you'll end up with two guest disk clusters mapped to the same cluster in the image file and that obviously means that you'll get data corruption. qemu-img check would tell you about such inconsistencies. Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html