2011/8/8 Avi Kivity <avi@xxxxxxxxxx>: > In certain circumstances, posix-aio-compat can incur a lot of latency: > - threads are created by vcpu threads, so if vcpu affinity is set, > aio threads inherit vcpu affinity. This can cause many aio threads > to compete for one cpu. > - we can create up to max_threads (64) aio threads in one go; since a > pthread_create can take around 30μs, we have up to 2ms of cpu time > under a global lock. > > Fix by: > - moving thread creation to the main thread, so we inherit the main > thread's affinity instead of the vcpu thread's affinity. > - if a thread is currently being created, and we need to create yet > another thread, let thread being born create the new thread, reducing > the amount of time we spend under the main thread. > - drop the local lock while creating a thread (we may still hold the > global mutex, though) > > Note this doesn't eliminate latency completely; scheduler artifacts or > lack of host cpu resources can still cause it. We may want pre-allocated > threads when this cannot be tolerated. > > Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions. > > Signed-off-by: Avi Kivity <avi@xxxxxxxxxx> Why not calling pthread_attr_setaffinity_np (where available) before thread creation or shed_setaffinity at thread start instead of telling another thread to create a thread for us just to get affinity cleared? Regards Frediano > --- > posix-aio-compat.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++-- > 1 files changed, 46 insertions(+), 2 deletions(-) > > diff --git a/posix-aio-compat.c b/posix-aio-compat.c > index 8dc00cb..aa30673 100644 > --- a/posix-aio-compat.c > +++ b/posix-aio-compat.c > @@ -30,6 +30,7 @@ > > #include "block/raw-posix-aio.h" > > +static void do_spawn_thread(void); > > struct qemu_paiocb { > BlockDriverAIOCB common; > @@ -64,6 +65,9 @@ static pthread_attr_t attr; > static int max_threads = 64; > static int cur_threads = 0; > static int idle_threads = 0; > +static int new_threads = 0; /* backlog of threads we need to create */ > +static int pending_threads = 0; /* threads created but not running yet */ > +static QEMUBH *new_thread_bh; > static QTAILQ_HEAD(, qemu_paiocb) request_list; > > #ifdef CONFIG_PREADV > @@ -311,6 +315,13 @@ static void *aio_thread(void *unused) > > pid = getpid(); > > + mutex_lock(&lock); > + if (new_threads) { > + do_spawn_thread(); > + } > + pending_threads--; > + mutex_unlock(&lock); > + > while (1) { > struct qemu_paiocb *aiocb; > ssize_t ret = 0; > @@ -381,11 +392,18 @@ static void *aio_thread(void *unused) > return NULL; > } > > -static void spawn_thread(void) > +static void do_spawn_thread(void) > { > sigset_t set, oldset; > > - cur_threads++; > + if (!new_threads) { > + return; > + } > + > + new_threads--; > + pending_threads++; > + > + mutex_unlock(&lock); > > /* block all signals */ > if (sigfillset(&set)) die("sigfillset"); > @@ -394,6 +412,31 @@ static void spawn_thread(void) > thread_create(&thread_id, &attr, aio_thread, NULL); > > if (sigprocmask(SIG_SETMASK, &oldset, NULL)) die("sigprocmask restore"); > + > + mutex_lock(&lock); > +} > + > +static void spawn_thread_bh_fn(void *opaque) > +{ > + mutex_lock(&lock); > + do_spawn_thread(); > + mutex_unlock(&lock); > +} > + > +static void spawn_thread(void) > +{ > + cur_threads++; > + new_threads++; > + /* If there are threads being created, they will spawn new workers, so > + * we don't spend time creating many threads in a loop holding a mutex or > + * starving the current vcpu. > + * > + * If there are no idle threads, ask the main thread to create one, so we > + * inherit the correct affinity instead of the vcpu affinity. > + */ > + if (!pending_threads) { > + qemu_bh_schedule(new_thread_bh); > + } > } > > static void qemu_paio_submit(struct qemu_paiocb *aiocb) > @@ -665,6 +708,7 @@ int paio_init(void) > die2(ret, "pthread_attr_setdetachstate"); > > QTAILQ_INIT(&request_list); > + new_thread_bh = qemu_bh_new(spawn_thread_bh_fn, NULL); > > posix_aio_state = s; > return 0; > -- > 1.7.5.3 > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html