On Tue, Jan 26, 2021 at 9:55 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > On 26/01/21 18:47, Ben Gardon wrote: > > Enough that it motivated me to implement this more complex union > > scheme. While the difference was pronounced in the dirty log perf test > > microbenchmark, it's an open question as to whether it would matter in > > practice. > > I'll look at getting some numbers if it's just the dirty log perf test. > Did you see anything in the profile pointing specifically at rwlock? When I did a strict replacement I found ~10% worse memory population performance. Running dirty_log_perf_test -v 96 -b 3g -i 5 with the TDP MMU disabled, I got 119 sec to populate memory as the baseline and 134 sec with an earlier version of this series which just replaced the spinlock with an rwlock. I believe this difference is statistically significant, but didn't run multiple trials. I didn't take notes when profiling, but I'm pretty sure the rwlock slowpath showed up a lot. This was a very high contention scenario, so it's probably not indicative of real-world performance. In the slow path, the rwlock is certainly slower than a spin lock. If the real impact doesn't seem too large, I'd be very happy to just replace the spinlock. > > Paolo >