On Mon, 2013-06-17 at 16:35 -0700, Davidlohr Bueso wrote: > On Tue, 2013-06-18 at 07:20 +0800, Alex Shi wrote: > > On 06/18/2013 12:22 AM, Davidlohr Bueso wrote: > > > After a lot of benchmarking, I finally got the ideal results for aim7, > > > so far: this patch + optimistic spinning with preemption disabled. Just > > > like optimistic spinning, this patch by itself makes little to no > > > difference, yet combined is where we actually outperform 3.10-rc5. In > > > addition, I noticed extra throughput when disabling preemption in > > > try_optimistic_spin(). > > > > > > With i_mmap as a rwsem and these changes I could see performance > > > benefits for alltests (+14.5%), custom (+17%), disk (+11%), high_systime > > > (+5%), shared (+15%) and short (+4%), most of them after around 500 > > > users, for fewer users, it made little to no difference. > > > > A pretty good number. what's the cpu number in your machine? :) > > 8-socket, 80 cores (ht off) > > David, I wonder if you are interested to try the experimental patch below. It tries to avoid unnecessary writes to the sem->count when we are going to fail the down_write by executing rwsem_down_write_failed_s instead of rwsem_down_write_failed. It should further reduce the cache line bouncing. It didn't make a difference for my workload. Wonder if it may help yours more in addition to the other two patches. Right now the patch is an ugly hack. I'll merge rwsem_down_write_failed_s and rwsem_down_write_failed into one function if this approach actually helps things. I'll clean these three patches after we have some idea of their effectiveness. Thanks. Tim Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> --- commit 04c8ad3f21861746d5b7fff55a6ef186a4dd0765 Author: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> Date: Mon Jun 10 04:50:04 2013 -0700 Try skip write to rwsem->count when we have active lockers diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index 0616ffe..83f9184 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -33,6 +33,7 @@ struct rw_semaphore { extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem); extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem); +extern struct rw_semaphore *rwsem_down_write_failed_s(struct rw_semaphore *sem); extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *); extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem); diff --git a/kernel/rwsem.c b/kernel/rwsem.c index cfff143..188f6ea 100644 --- a/kernel/rwsem.c +++ b/kernel/rwsem.c @@ -42,12 +42,22 @@ EXPORT_SYMBOL(down_read_trylock); /* * lock for writing */ + +static void ___down_write(struct rw_semaphore *sem) +{ + if (sem->count & RWSEM_ACTIVE_MASK) { + rwsem_down_write_failed_s(sem); + return; + } + __down_write(sem); +} + void __sched down_write(struct rw_semaphore *sem) { might_sleep(); rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED(sem, __down_write_trylock, __down_write); + LOCK_CONTENDED(sem, __down_write_trylock, ___down_write); } EXPORT_SYMBOL(down_write); diff --git a/lib/rwsem.c b/lib/rwsem.c index 19c5fa9..25143b5 100644 --- a/lib/rwsem.c +++ b/lib/rwsem.c @@ -248,6 +248,63 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem) return sem; } +struct rw_semaphore __sched *rwsem_down_write_failed_s(struct rw_semaphore *sem) +{ + long count, adjustment = 0; + struct rwsem_waiter waiter; + struct task_struct *tsk = current; + + /* set up my own style of waitqueue */ + waiter.task = tsk; + waiter.type = RWSEM_WAITING_FOR_WRITE; + + raw_spin_lock_irq(&sem->wait_lock); + if (list_empty(&sem->wait_list)) + adjustment += RWSEM_WAITING_BIAS; + list_add_tail(&waiter.list, &sem->wait_list); + + /* If there were already threads queued before us and there are no + * active writers, the lock must be read owned; so we try to wake + * any read locks that were queued ahead of us. */ + if (adjustment == 0) { + if (sem->count > RWSEM_WAITING_BIAS) + sem = __rwsem_do_wake(sem, RWSEM_WAKE_READERS); + } else + count = rwsem_atomic_update(adjustment, sem); + + /* wait until we successfully acquire the lock */ + set_task_state(tsk, TASK_UNINTERRUPTIBLE); + while (true) { + if (!(sem->count & RWSEM_ACTIVE_MASK)) { + /* Try acquiring the write lock. */ + count = RWSEM_ACTIVE_WRITE_BIAS; + if (!list_is_singular(&sem->wait_list)) + count += RWSEM_WAITING_BIAS; + + if (sem->count == RWSEM_WAITING_BIAS && + cmpxchg(&sem->count, RWSEM_WAITING_BIAS, count) == + RWSEM_WAITING_BIAS) + break; + } + + raw_spin_unlock_irq(&sem->wait_lock); + + /* Block until there are no active lockers. */ + do { + schedule(); + set_task_state(tsk, TASK_UNINTERRUPTIBLE); + } while ((count = sem->count) & RWSEM_ACTIVE_MASK); + + raw_spin_lock_irq(&sem->wait_lock); + } + + list_del(&waiter.list); + raw_spin_unlock_irq(&sem->wait_lock); + tsk->state = TASK_RUNNING; + + return sem; +} + /* * handle waking up a waiter on the semaphore * - up_read/up_write has decremented the active part of count if we come here @@ -289,5 +346,6 @@ struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem) EXPORT_SYMBOL(rwsem_down_read_failed); EXPORT_SYMBOL(rwsem_down_write_failed); +EXPORT_SYMBOL(rwsem_down_write_failed_s); EXPORT_SYMBOL(rwsem_wake); EXPORT_SYMBOL(rwsem_downgrade_wake); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>