[PATCH] use unfair spinlock when running on hypervisor.

Gleb Natapov <gleb@xxxxxxxxxx> · Tue, 1 Jun 2010 12:35:15 +0300

Ticket lock spinlock ensures fairness by introducing FIFO of cpus waiting
for spinlock to be released. This works great on real HW, but when running
on hypervisor it introduce very heavy performance hit if physical cpus
are overcommitted (up to 35% in my test). The reason for performance
hit is that vcpu that at the head of the FIFO can be unscheduled and no
other vcpu waiting on the lock can proceed.  This result in a big time
gaps where no vcpu is in critical section. Even worse they are constantly
spinning and not allowing vcpu at the head of the FIFO to be scheduled
to run.

The patch below allows to patch ticket spinlock code to behave similar to
old unfair spinlock when hypervisor is detected. After patching unlocked
condition remains the same (next == owner), but if vcpu detects that lock
is already taken it spins till (next == owner == 0) and when the condition
is met it tries to reacquire the lock. Unlock simply zeroes head and tail.
Trylock state the same since unlocked condition stays the same.

I ran two guest with 16 vcpus each one. One guest with this patch another
without. Inside those guests I ran kernel compilation "time make -j 16 all".
Here are results of two runs:

patched:                        unpatched:
real 13m34.674s                 real 19m35.827s
user 96m2.638s                  user 102m38.665s
sys 56m14.991s                  sys 158m22.470s

real 13m3.768s                  real 19m4.375s
user 95m34.509s                 user 111m9.903s
sys 53m40.550s                  sys 141m59.370s

Note that for 6 minutes unpatched guest ran without cpu oversubscription
since patched guest was already idle.

Running kernbench on the host with and without the patch:

Unpatched:                                            Patched:
Average Half load -j 8 Run (std deviation):
Elapsed Time 312.538 (0.779404)                       Elapsed Time 311.974 (0.258031)
User Time 2154.89 (6.62261)                           User Time 2151.94 (1.96702)
System Time 233.78 (1.96642)                          System Time 236.472 (0.695572)
Percent CPU 763.8 (1.64317)                           Percent CPU 765 (0.707107)
Context Switches 39348.2 (1542.87)                    Context Switches 41522.4 (1193.13)
Sleeps 871688 (5596.52)                               Sleeps 877317 (1135.83)

Average Optimal load -j 16 Run (std deviation):
Elapsed Time 259.878 (0.444826)                       Elapsed Time 258.842 (0.413122)
User Time 2549.22 (415.685)                           User Time 2538.42 (407.383)
System Time 261.084 (28.8145)                         System Time 263.847 (28.8638)
Percent CPU 1003.4 (252.566)                          Percent CPU 1003.5 (251.407)
Context Switches 228418 (199308)                      Context Switches 228894 (197533)
Sleeps 898721 (28757.2)                               Sleeps 902020 (26074.5)

Signed-off-by: Gleb Natapov <gleb@xxxxxxxxxx>

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..b919b54 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -60,19 +60,27 @@
 
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-	short inc = 0x0100;
+	short inc;
 
 	asm volatile (
+		"1:\t\n"
+		"mov $0x100, %0\n\t"
 		LOCK_PREFIX "xaddw %w0, %1\n"
-		"1:\t"
+		"2:\t"
 		"cmpb %h0, %b0\n\t"
-		"je 2f\n\t"
+		"je 4f\n\t"
+		"3:\t\n"
 		"rep ; nop\n\t"
-		"movb %1, %b0\n\t"
 		/* don't need lfence here, because loads are in-order */
-		"jmp 1b\n"
-		"2:"
-		: "+Q" (inc), "+m" (lock->slock)
+		ALTERNATIVE(
+		"movb %1, %b0\n\t"
+		"jmp 2b\n",
+		"nop", X86_FEATURE_HYPERVISOR)"\n\t"
+		"cmpw $0, %1\n\t"
+		"jne 3b\n\t"
+		"jmp 1b\n\t"
+		"4:"
+		: "=Q" (inc), "+m" (lock->slock)
 		:
 		: "memory", "cc");
 }
@@ -98,10 +106,13 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
-	asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
-		     : "+m" (lock->slock)
-		     :
-		     : "memory", "cc");
+	asm volatile(
+		ALTERNATIVE(UNLOCK_LOCK_PREFIX "incb (%0);"ASM_NOP3,
+			    UNLOCK_LOCK_PREFIX "movw $0, (%0)",
+			    X86_FEATURE_HYPERVISOR)
+		:
+		: "Q" (&lock->slock)
+		: "memory", "cc");
 }
 #else
 #define TICKET_SHIFT 16
--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html