Re: [PATCH 0/2] mm/swap: Add locking for pagevec

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Tue, 16 Oct 2018 19:13:48 +0200 (CEST)

On Tue, 16 Oct 2018, Frederic Weisbecker wrote:

> On Mon, Oct 15, 2018 at 10:50:48AM +0100, Mel Gorman wrote:
> > On Fri, Oct 12, 2018 at 09:21:41AM +0200, Vlastimil Babka wrote:
> > > On 9/14/18 4:59 PM, Sebastian Andrzej Siewior wrote:
> > > I think this evaluation is missing the other side of the story, and
> > > that's the cost of using a spinlock (even uncontended) instead of
> > > disabling preemption. The expectation for LRU pagevec is that the local
> > > operations will be much more common than draining of other CPU's, so
> > > it's optimized for the former.
> > > 
> > 
> > Agreed, the drain operation should be extremely rare except under heavy
> > memory pressure, particularly if mixed with THP allocations. The overall
> > intent seems to be improving lockdep coverage but I don't think we
> > should take a hit in the common case just to get that coverage. Bear in
> > mind that the main point of the pagevec (whether it's true or not) is to
> > avoid the much heavier LRU lock.
> 
> So indeed, if the only purpose of this patch were to make lockdep wiser,
> a pair of spin_lock_acquire() / spin_unlock_release() would be enough to
> teach it and would avoid the overhead.
> 
> Now another significant incentive behind this change is to improve CPU isolation.
> Workloads relying on owning the entire CPU without being disturbed are interested
> in this as it allows to offload some noise. It's no big deal for those who can
> tolerate rare events but often CPU isolation is combined with deterministic latency
> requirements.
> 
> So, I'm not saying this per-CPU spinlock is necessarily the right answer, I
> don't know that code enough to have an opinion, but I still wish we can find
> a solution.

One way to solve this and I had played with it already is to make the smp
function call based variant and the lock based variant switchable at boot
time with a static key. That way CPU isolation can select it and take the
penalty while normal workloads are not affected.

Thanks,

	tglx