Christoph Hellwig wrote:
On Mon, Mar 23, 2009 at 12:14:58PM -0500, Anthony Liguori wrote:
I'd like to see the O_DIRECT bounce buffering removed in favor of the
DMA API bouncing. Once that happens, raw_read and raw_pread can
disappear. block-raw-posix becomes much simpler.
See my vectored I/O patches for doing the bounce buffering at the
optimal place for the aio path. Note that from my reading of the
qcow/qcow2 code they might send down unaligned requests, which is
something the dma api would not help with.
I was going to look today at applying those.
For the buffered I/O path we will always have to do some sort of buffering
due to all the partition header reading / etc. And given how that part
isn't performance critical my preference would be to keep doing it in
bdrv_pread/write and guarantee the lowlevel drivers proper alignment.
I really dislike having so many APIs. I'd rather have an aio API that
took byte accesses or have pread/pwrite always be emulated with a full
sector read/write
We would drop the signaling stuff and have the thread pool use an fd to
signal. The big problem with that right now is that it'll cause a
performance regression for certain platforms until we have the IO thread
in place.
Talking about signaling, does anyone remember why the Linux signalfd/
eventfd support is only in kvm but not in upstream qemu?
Because upstream QEMU doesn't yet have an IO thread.
TCG chains together TBs and if you have a tight loop in a VCPU, then the
only way to break out of the loop is to receive a signal. The signal
handler will call cpu_interrupt() which will unchain TBs allowing TCG
execution to break once you return from the signal handler.
An IO thread solves this in a different way by letting select() always
run in parallel to TCG VCPU execution. When select() returns you can
send a signal to the TCG VCPU thread to break it out of chained TBs.
Not all IO in qemu generates a signal so this a potential problem but in
practice, if we don't generate a signal for disk IO completion, a number
of real world guests breaks (mostly non-x86 boards).
Regards,
Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html