On Mon, Jul 11, 2011 at 1:29 PM, Kevin Wolf <kwolf@xxxxxxxxxx> wrote: > Am 11.07.2011 11:41, schrieb Pekka Enberg: >> Hi Kevin, >> >> On Mon, Jul 11, 2011 at 12:31 PM, Kevin Wolf <kwolf@xxxxxxxxxx> wrote: >>> I would love to try out your code occasionally myself, but so far I have >>> been to lazy to build a guest kernel only to be able to test it. Having >>> to deal with the huge kernel git tree just for a small program doesn't >>> really make it more fun either... Anyway, what I'm trying to say is that >>> everything in my mails is from a purely theoretical POV. I have only >>> looked at the code, but never really tried it. >> >> Most distro kernels boot just fine, AFAIK. If you have a kernel tree >> laying around, you can use >> >> git remote add kvm-tool git://github.com/penberg/linux-kvm.git >> git remote update kvm-tool >> >> to fetch the sources. > > Yeah, I do have the source and I read some parts of it. Just running it > didn't seem to work with the standard Fedora kernel last time. Seems to > work now, so it was probably my fault. > > Not sure what I did different last time, maybe I relied on it to pick up > kernel and initrd automatically from the host (it finds the kernel, but > not the initrd). Yeah, we should really add automatic initrd detection too. >>> As Ingo already said, the cache mode is probably the major difference. >>> From what I can see in your code, cache=writeback would be the >>> equivalent for what tools/kvm is doing, however cache=none (i.e. >>> O_DIRECT) is what people usually do with qemu. >> >> Yup, I posted 'cache=writeback' results too which are much closer to >> tools/kvm numbers. > > Saw it. cache=none would probably help with the stability, but of course > you would also have to add O_DIRECT to tools/kvm to make it fair. > >>> And then there seems to be another big difference. I hope I'm not >>> missing anything, but you seem to be completely lacking refcount >>> handling for qcow2. This is okay for read-only image, but with write >>> access to the image, you're corrupting the images if you don't update >>> the refcounts. Have you checked qcow2 images with qemu-img check after >>> tools/kvm having written to it? >>> >>> Maintaining the right order between L2 writes and refcount block writes >>> is another source of flushes in qemu, which of course makes a difference >>> for performance. >> >> Yes, you're absolutely correct. We don't support copy-on-write images >> and I didn't realize until yesterday evening that we don't even check >> the 'copied' bit to make sure writes are safe. >> >> However, for these particular numbers, it doesn't matter that much >> because it's all append-only and thus shouldn't trigger any of the >> copy-on-write paths. > > It has nothing to do with copy on write. Well, of course COW is the > reason why the refcounts exist at all, but for a correct qcow2 image > they must be consistent even when you don't do COW. > > The problem is that when you run an image, in which tools/kvm has > allocated new clusters, in qemu, it will use the refcount table and > still see the clusters as free. So you'll end up with two guest disk > clusters mapped to the same cluster in the image file and that obviously > means that you'll get data corruption. > > qemu-img check would tell you about such inconsistencies. Aah, OK, we need to fix that. Thanks! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html