Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race

Manfred Spraul <manfred@xxxxxxxxxxxxxxxx> · Thu, 23 Jun 2016 20:55:06 +0200

On 06/21/2016 01:04 AM, Andrew Morton wrote:
On Sat, 18 Jun 2016 22:02:21 +0200 Manfred Spraul <manfred@xxxxxxxxxxxxxxxx> wrote:

Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race:

sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)

As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).

The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.

The patch also updates stale documentation regarding the locking.

With regards to stable kernels:
The patch is required for all kernels that include the commit 6d07b68ce16a
("ipc/sem.c: optimize sem_lock()") (3.10?)
I've had this in -mm (and -next) since January 4, without issues.  I
put it on hold because Davidlohr expressed concern about performance
regressions.
I had several ideas how to fix it. The initial ideas probably had 
performance issue.

The current one doesn't have any issues. It just took longer than 
expected to test it.
Your [2/2] should prevent those regressions (yes?) so I assume that any
kernel which has [1/2] really should have [2/2] as well.  But without
any quantitative information, this is all mad guesswork.

What to do?
[2/2] is an improvement, it handles one case better than the current code.
If you want:
3.10 improved scalability, but it introduced a performance regression 
for one use case.
[with 3.10, simple ops got parallel, but complex ops had to perform a 
"for_each_sem() {spin_unlock_wait()}"]
The patch fixes that.

--
    Manfred
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html