On Tue, Oct 30, 2018 at 11:55:46PM +0200, Igor Stoppa wrote: > On 30/10/2018 23:25, Matthew Wilcox wrote: > > On Tue, Oct 30, 2018 at 11:51:17AM -0700, Andy Lutomirski wrote: > > > Finally, one issue: rare_alloc() is going to utterly suck > > > performance-wise due to the global IPI when the region gets zapped out > > > of the direct map or otherwise made RO. This is the same issue that > > > makes all existing XPO efforts so painful. We need to either optimize > > > the crap out of it somehow or we need to make sure it’s not called > > > except during rare events like device enumeration. > > > > Batching operations is kind of the whole point of the VM ;-) Either > > this rare memory gets used a lot, in which case we'll want to create slab > > caches for it, make it a MM zone and the whole nine yeards, or it's not > > used very much in which case it doesn't matter that performance sucks. > > > > For now, I'd suggest allocating 2MB chunks as needed, and having a > > shrinker to hand back any unused pieces. > > One of the prime candidates for this sort of protection is IMA. > In the IMA case, there are ever-growing lists which are populated when > accessing files. > It's something that ends up on the critical path of any usual performance > critical use case, when accessing files for the first time, like at > boot/application startup. > > Also the SELinux AVC is based on lists. It uses an object cache, but it is > still something that grows and is on the critical path of evaluating the > callbacks from the LSM hooks. A lot of them. > > These are the main two reasons, so far, for me advocating an optimization of > the write-rare version of the (h)list. I think these are both great examples of why doubly-linked lists _suck_. You have to modify three cachelines to add an entry to a list. Walking a linked list is an exercise in cache misses. Far better to use an XArray / IDR for this purpose.