On Thu, Apr 29, 2021 at 12:14:28PM -0700, Michel Lespinasse wrote: > On Thu, Apr 29, 2021 at 05:12:34PM +0100, Matthew Wilcox wrote: > > One of the worst things we can do while holding a spinlock is take a > > cache miss because we then delay for several thousand cycles to wait for > > the cache line. That gives every other CPU a really long opportunity > > to slam into the spinlock and things go downhill fast at that point. > > We've even seen patches to do things like read A, take lock L, then read > > A to avoid the cache miss while holding the lock. > > I understand the effect your are describing, but I do not see how it > applies here - what cacheline are we likely to miss on when using > local_irq_disable() that we wouldn't touch if using rcu_read_lock() ? It's the same cache lines in both cases. The difference is that in one case we have interrupts disabled (and a spinlock held? i wasn't clear on that) and in the other case, we just have preemption disabled. > > What sort of performance effect would it have to free page tables > > under RCU for all architectures? It's painful on s390 & powerpc because > > different tables share the same struct page, but I have to believe that's > > a solvable problem. > > I agree using RCU to free page tables would be a good thing to try. > I am afraid of adding that to this patchset though, as it seems > somewhate unrelated and adds risk. IMO we are most likely to find > justification for pushing this if/when we try accessing remote mm's without > taking the mmap lock, since disabling IPIs clearly wouldn't work there. I think that needs to happen _before_ this patchset. Creating a mess and then trying to clean it up later isn't a great way to do development.