On 08/23/2011 06:01 AM, Stefan Hajnoczi wrote:
On Mon, Aug 22, 2011 at 6:29 PM, Jan Kiszka<jan.kiszka@xxxxxxxxxxx> wrote:
On 2011-08-14 06:04, Avi Kivity wrote:
In certain circumstances, posix-aio-compat can incur a lot of latency:
- threads are created by vcpu threads, so if vcpu affinity is set,
aio threads inherit vcpu affinity. This can cause many aio threads
to compete for one cpu.
- we can create up to max_threads (64) aio threads in one go; since a
pthread_create can take around 30μs, we have up to 2ms of cpu time
under a global lock.
Fix by:
- moving thread creation to the main thread, so we inherit the main
thread's affinity instead of the vcpu thread's affinity.
- if a thread is currently being created, and we need to create yet
another thread, let thread being born create the new thread, reducing
the amount of time we spend under the main thread.
- drop the local lock while creating a thread (we may still hold the
global mutex, though)
Note this doesn't eliminate latency completely; scheduler artifacts or
lack of host cpu resources can still cause it. We may want pre-allocated
threads when this cannot be tolerated.
Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions.
At this chance: What is the state of getting rid of the remaining delta
between upstream's version and qemu-kvm?
That would be nice. qemu-kvm.git uses a signalfd to handle I/O
completion whereas qemu.git uses a signal, writes to a pipe from the
signal handler, and uses qemu_notify_event() to break the vcpu. Once
the force iothread patch is merged we should be able to move to
qemu-kvm.git's signalfd approach.
No need to use a signal at all actually. The use of a signal is
historic and was required to work around the TCG race that I referred to
in another thread.
You should be able to just use an eventfd or pipe.
Better yet, we should look at using GThreadPool to replace posix-aio-compat.
Regards,
Anthony Liguori
Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html