Hi Srivatsa, I think there is some elegance in Lai's proposal of using a local trylock for the reader uncontended case and global rwlock to deal with the contended case without deadlocks. He apparently didn't realize initially that nested read locks are common, and he seems to have confused you because of that, but I think his proposal could be changed easily to account for that and result in short, easily understood code. What about the following: - local_refcnt is a local lock count; it indicates how many recursive locks are taken using the local lglock - lglock is used by readers for local locking; it must be acquired before local_refcnt becomes nonzero and released after local_refcnt goes back to zero. - fallback_rwlock is used by readers for global locking; it is acquired when fallback_reader_refcnt is zero and the trylock fails on lglock +void lg_rwlock_local_read_lock(struct lgrwlock *lgrw) +{ + preempt_disable(); + + if (__this_cpu_read(*lgrw->local_refcnt) || + arch_spin_trylock(this_cpu_ptr(lgrw->lglock->lock))) { + __this_cpu_inc(*lgrw->local_refcnt); + rwlock_acquire_read(&lgrw->fallback_rwlock->lock_dep_map, 0, 0, _RET_IP_); + } else { + read_lock(&lgrw->fallback_rwlock); + } +} +EXPORT_SYMBOL(lg_rwlock_local_read_lock); + +void lg_rwlock_local_read_unlock(struct lgrwlock *lgrw) +{ + if (likely(__this_cpu_read(*lgrw->local_refcnt))) { + rwlock_release(&lgrw->fallback_rwlock->lock_dep_map, 1, _RET_IP_); + if (!__this_cpu_dec_return(*lgrw->local_refcnt)) + arch_spin_unlock(this_cpu_ptr(lgrw->lglock->lock)); + } else { + read_unlock(&lgrw->fallback_rwlock); + } + + preempt_enable(); +} +EXPORT_SYMBOL(lg_rwlock_local_read_unlock); + +void lg_rwlock_global_write_lock(struct lgrwlock *lgrw) +{ + int i; + + preempt_disable(); + + for_each_possible_cpu(i) + arch_spin_lock(per_cpu_ptr(lgrw->lglock->lock, i)); + write_lock(&lgrw->fallback_rwlock); +} +EXPORT_SYMBOL(lg_rwlock_global_write_lock); + +void lg_rwlock_global_write_unlock(struct lgrwlock *lgrw) +{ + int i; + + write_unlock(&lgrw->fallback_rwlock); + for_each_possible_cpu(i) + arch_spin_unlock(per_cpu_ptr(lgrw->lglock->lock, i)); + + preempt_enable(); +} +EXPORT_SYMBOL(lg_rwlock_global_write_unlock); This is to me relatively easier to understand than Srivatsa's proposal. Now I'm not sure who wins efficiency wise, but I think it should be relatively close as readers at least don't touch shared state in the uncontended case (even with some recursion going on). There is an interesting case where lg_rwlock_local_read_lock could be interrupted after getting the local lglock but before incrementing local_refcnt to 1; if that happens any nested readers within that interrupt will have to take the global rwlock read side. I think this is perfectly acceptable as this should not be a common case though (and thus the global rwlock cache line probably wouldn't even bounce between cpus then). -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html