Re: [tip:core/locking] x86/smp: Move waiting on contended ticket lock out of line

Rik van Riel <riel@xxxxxxxxxx> · Wed, 27 Feb 2013 11:42:21 -0500

On 02/13/2013 08:21 PM, Linus Torvalds wrote:

On Wed, Feb 13, 2013 at 3:41 PM, Rik van Riel <riel@xxxxxxxxxx> wrote:

I have an example of the second case. It is a test case
from a customer issue, where an application is contending on
semaphores, doing semaphore lock and unlock operations. The
test case simply has N threads, trying to lock and unlock the
same semaphore.

The attached graph (which I sent out with the intro email to
my patches) shows how reducing the memory accesses from the
spinlock wait path prevents the large performance degradation
seen with the vanilla kernel. This is on a 24 CPU system with
4 6-core AMD CPUs.

The "prop-N" series are with a fixed delay proportional back-off.
You can see that a small value of N does not help much for large
numbers of cpus, and a large value hurts with a small number of
CPUs. The automatic tuning appears to be quite robust.

Ok, good, so there are some numbers. I didn't see any in the commit
messages anywhere, and since the threads I've looked at are from
tip-bot, I never saw the intro email.

Some people at HP have collected an extensive list of AIM 7 results,
all the different AIM 7 workloads, on an 80-core DL-980, with HT
disabled.

The AIM7 workloads all work by slowly increasing the number of
worker processes, all of which have some duty cycle (busy & sleep).
Adding more processes tends to increase the number of jobs/minute
completed, up to a certain point. For some workloads, the system
has a performance peak and performance levels up at or near that
peak, for other workloads performance drops when more processes
are added beyond the peak, and performance drops to a lower plateau.

To keep the results readable and relevant, I am reporting the
plateau performance numbers. Comments are given where required.

		3.7.6 vanilla	3.7.6 w/ backoff

all_utime		333000		333000
alltests	300000-470000	180000-440000	large variability
compute			528000		528000
custom		290000-320000	250000-330000	4 fast runs, 1 slow
dbase			920000		925000
disk			100000	 90000-120000	similar plateau, wild
						swings with patches
five_sec		140000		140000
fserver		160000-300000	250000-430000	w/ patch drops off at
						higher number of users
high_systime	 80000-110000	 30000-125000	w/ patch mostly 40k-70k,
						wild wings
long		no performance platform, equal performance for both
new_dbase		960000		96000
new_fserver	150000-300000	210000-420000	vanilla drops off,
						w/ patches wild swings
shared		270000-440000	120000-440000	all runs ~equal to
						vanilla up to 1000
						users, one out of 5
						runs slows down past
						1100 users
short			120000		190000

In conclusion, the spinlock backoff patches seem to significantly
improve performance in workloads where there is simple contention
on just one or two spinlocks. However, in more complex workloads,
high variability is seen, including performance regression in some
test runs.

One hypothesis is that before the spinlock backoff patches, the
workloads get contention (and bottleneck) on multiple locks. With
the patches, the contention on some of the locks is relieved, and
more tasks bunch up on the remaining bottlenecks, leading to worse
performance.

That said, it's interesting that this happens with the semaphore path.
We've had other cases where the spinlock in the *sleeping* locks have
caused problems, and I wonder if we should look at that path in
particular.

If we want to get reliable improved performance without unpredictable
performance swings, we should probably change some of the kernel's
spinlocks, especially the ones embedded in sleeping locks, into
scalable locks like Michel's implementation of MCS locks.

We may be hitting the limit of what can be done with the current
ticket lock data structure. It simply may not scale as far as the
hardware on which Linux is being run.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html