On Wed, Sep 16, 2020 at 08:32:20PM +0800, Hou Tao wrote: > I have simply test the performance impact on both x86 and aarch64. > > There is no degradation under x86 (2 sockets, 18 core per sockets, 2 threads per core) Yeah, x86 is magical here, it's the same single instruction for both ;-) But it is, afaik, unique in this position, no other arch can pull that off. > However the performance degradation is huge under aarch64 (4 sockets, 24 core per sockets): nearly 60% lost. > > v4.19.111 > no writer, reader cn | 24 | 48 | 72 | 96 > the rate of down_read/up_read per second | 166129572 | 166064100 | 165963448 | 165203565 > the rate of down_read/up_read per second (patched) | 63863506 | 63842132 | 63757267 | 63514920 Teh hurt :/