On Sat, 2012-07-28 at 12:41 -0400, Mikulas Patocka wrote: > Introduce percpu rw semaphores > > When many CPUs are locking a rw semaphore for read concurrently, cache > line bouncing occurs. When a CPU acquires rw semaphore for read, the > CPU writes to the cache line holding the semaphore. Consequently, the > cache line is being moved between CPUs and this slows down semaphore > acquisition. > > This patch introduces new percpu rw semaphores. They are functionally > identical to existing rw semaphores, but locking the percpu rw semaphore > for read is faster and locking for write is slower. > > The percpu rw semaphore is implemented as a percpu array of rw > semaphores, each semaphore for one CPU. When some thread needs to lock > the semaphore for read, only semaphore on the current CPU is locked for > read. When some thread needs to lock the semaphore for write, semaphores > for all CPUs are locked for write. This avoids cache line bouncing. > > Note that the thread that is locking percpu rw semaphore may be > rescheduled, it doesn't cause bug, but cache line bouncing occurs in > this case. > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> I am curious to see how this performs with 4096 cpus ? Really you shouldnt use rwlock in a path if this might hurt performance. RCU is probably a better answer. (bdev->bd_block_size should be read exactly once ) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel