Le lundi 21 mars 2011 Ã 23:57 +0545, Ben Nagy a Ãcrit : > On Mon, Mar 21, 2011 at 10:57 PM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > > Le lundi 21 mars 2011 Ã 19:02 +0200, Avi Kivity a Ãcrit : > > > >> Any ideas on how to fix it? We could pre-allocate IDs and batch them in > >> per-cpu caches, but it seems like a lot of work. > >> > > > > Hmm, I dont know what syscalls kvm do, but even a timer_gettime() has to > > lock this idr_lock. > > > > Sounds crazy, and should be fixed in kernel code, not kvm ;) > > Hi, > > I'll need to work out a way I can make the perf.data available > (~100M), but here's the summary > > http://paste.ubuntu.com/583425/ > > And here's the summary of the summary > > # Overhead Command Shared Object > Symbol > # ........ ............... .................... > .......................................... > # > 71.86% kvm [kernel.kallsyms] [k] __ticket_spin_lock > | > --- __ticket_spin_lock > | > |--54.19%-- default_spin_lock_flags > | _raw_spin_lock_irqsave > | | > | --54.14%-- __lock_timer > | | > | |--31.92%-- sys_timer_gettime > | | system_call_fastpath > | | > | --22.22%-- sys_timer_settime > | system_call_fastpath > | > |--15.66%-- _raw_spin_lock > [...] > > Which I guess seems to match what Eric just said. > > I'll post a link to the full data tomorrow. Many thanks for the help > so far. If it's a kernel scaling limit then I guess we just wait until > the kernel gets better. :S I'll check it out with a linux guest > tomorrow anyway. Here is a quick & dirty patch to lower idr_lock use. You could try it eventually ;) Thanks [RFC] posix-timers: RCU conversion Ben Nagy reported a scalability problem with KVM/QEMU that hit very hard a single spinlock (idr_lock) in posix-timers code. Even on a 16 cpu machine (2x4x2), a single test can show 98% of cpu time used in ticket_spin_lock Switching to RCU should be quite easy, IDR being already RCU ready. idr_lock should be locked only before an insert/delete, not a find. Reported-by: Ben Nagy <ben@xxxxxxxx> Signed-off-by: Eric Dumazet <eric.dumazet@xxxxxxxxx> Cc: Avi Kivity <avi@xxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: John Stultz <johnstul@xxxxxxxxxx> Cc: Richard Cochran <richard.cochran@xxxxxxxxxx> --- include/linux/posix-timers.h | 1 + kernel/posix-timers.c | 26 ++++++++++++++------------ 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index d51243a..5dc27ca 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -81,6 +81,7 @@ struct k_itimer { unsigned long expires; } mmtimer; } it; + struct rcu_head rcu; }; struct k_clock { diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c index 4c01249..e2a823a 100644 --- a/kernel/posix-timers.c +++ b/kernel/posix-timers.c @@ -491,6 +491,11 @@ static struct k_itimer * alloc_posix_timer(void) return tmr; } +static void k_itimer_rcu_free(struct rcu_head *head) +{ + kmem_cache_free(posix_timers_cache, container_of(head, struct k_itimer, rcu)); +} + #define IT_ID_SET 1 #define IT_ID_NOT_SET 0 static void release_posix_timer(struct k_itimer *tmr, int it_id_set) @@ -499,11 +504,12 @@ static void release_posix_timer(struct k_itimer *tmr, int it_id_set) unsigned long flags; spin_lock_irqsave(&idr_lock, flags); idr_remove(&posix_timers_id, tmr->it_id); + tmr->it_id = -1; spin_unlock_irqrestore(&idr_lock, flags); } put_pid(tmr->it_pid); sigqueue_free(tmr->sigq); - kmem_cache_free(posix_timers_cache, tmr); + call_rcu(&tmr->rcu, k_itimer_rcu_free); } static struct k_clock *clockid_to_kclock(const clockid_t id) @@ -631,22 +637,18 @@ out: static struct k_itimer *__lock_timer(timer_t timer_id, unsigned long *flags) { struct k_itimer *timr; - /* - * Watch out here. We do a irqsave on the idr_lock and pass the - * flags part over to the timer lock. Must not let interrupts in - * while we are moving the lock. - */ - spin_lock_irqsave(&idr_lock, *flags); + + rcu_read_lock(); timr = idr_find(&posix_timers_id, (int)timer_id); if (timr) { - spin_lock(&timr->it_lock); - if (timr->it_signal == current->signal) { - spin_unlock(&idr_lock); + spin_lock_irqsave(&timr->it_lock, *flags); + if (timr->it_id == (int)timer_id && timr->it_signal == current->signal) { + rcu_read_unlock(); return timr; } - spin_unlock(&timr->it_lock); + spin_unlock_irqrestore(&timr->it_lock, *flags); } - spin_unlock_irqrestore(&idr_lock, *flags); + rcu_read_unlock(); return NULL; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html