Re: [PATCH RESEND -rt] mm: perform lru_add_drain_all() remotely

Luiz Capitulino <lcapitulino@xxxxxxxxxx> · Thu, 12 May 2016 09:51:49 -0400

On Thu, 12 May 2016 10:42:30 +0200
Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote:

> * Luiz Capitulino | 2016-05-09 10:50:37 [-0400]:
> 
> >diff --git a/include/linux/locallock.h b/include/linux/locallock.h
> >index 6fe5928..f4cd691 100644
> >--- a/include/linux/locallock.h
> >+++ b/include/linux/locallock.h
> >@@ -104,6 +104,17 @@ static inline void __local_unlock(struct local_irq_lock *lv)
> > 		put_local_var(lvar);				\
> > 	} while (0)
> > 
> >+#define local_lock_other_cpu(lvar, cpu)                         \  
> 
> I would prefer to have local_lock_on() to be in sync with
> local_lock_irq_on()
> 
> >+	do {                                                    \
> >+		__local_lock(&per_cpu(lvar, cpu));              \
> >+	} while (0)
> >+
> >+#define local_unlock_other_cpu(lvar, cpu)                       \
> >+	do {                                                    \
> >+		__local_unlock(&per_cpu(lvar, cpu));            \
> >+	} while (0)
> >+
> >+
> > static inline void __local_lock_irq(struct local_irq_lock *lv)
> > {
> > 	spin_lock_irqsave(&lv->lock, lv->flags);
> >@@ -163,6 +174,22 @@ static inline int __local_lock_irqsave(struct local_irq_lock *lv)
> > 		_flags = per_cpu(lvar, cpu).flags;			\
> > 	} while (0)
> > 
> >+#define local_lock_irqsave_other_cpu(lvar, _flags, cpu)			\  
> 
> why not local_lock_irq_on()? 

I can change both.

> >+	do {								\
> >+		if (cpu == smp_processor_id())				\
> >+			local_lock_irqsave(lvar, _flags);		\
> >+		else							\
> >+			local_lock_other_cpu(lvar, cpu);		\
> >+	} while (0)
> >+  
> …
> 
> >diff --git a/mm/swap.c b/mm/swap.c
> >index ca194ae..84c3c21 100644
> >--- a/mm/swap.c
> >+++ b/mm/swap.c
> >@@ -821,9 +821,9 @@ void lru_add_drain_cpu(int cpu)
> > 		unsigned long flags;
> > 
> > 		/* No harm done if a racing interrupt already did this */
> >-		local_lock_irqsave(rotate_lock, flags);
> >+		local_lock_irqsave_other_cpu(rotate_lock, flags, cpu);
> > 		pagevec_move_tail(pvec);
> >-		local_unlock_irqrestore(rotate_lock, flags);
> >+		local_unlock_irqrestore_other_cpu(rotate_lock, flags, cpu);  
> 
> This piece might be required independently of this patch. It would be
> nice to have it in page_alloc_cpu_notify() :) I take care of this…
> 
> > 	}
> > 
> > 	pvec = &per_cpu(lru_deactivate_file_pvecs, cpu);
> >@@ -866,12 +866,32 @@ void lru_add_drain(void)
> > 	local_unlock_cpu(swapvec_lock);
> > }
> > 
> >+static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work);
> >+
> >+#ifdef CONFIG_PREEMPT_RT_BASE
> >+static inline void remote_lru_add_drain(int cpu, struct cpumask *has_work)
> >+{
> >+	local_lock_other_cpu(swapvec_lock, cpu);
> >+	lru_add_drain_cpu(cpu);
> >+	local_unlock_other_cpu(swapvec_lock, cpu);
> >+}
> >+#else
> > static void lru_add_drain_per_cpu(struct work_struct *dummy)
> > {
> > 	lru_add_drain();
> > }
> > 
> >-static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work);
> >+static inline void remote_lru_add_drain(int cpu, struct cpumask *has_work)
> >+{
> >+		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
> >+
> >+		INIT_WORK(work, lru_add_drain_per_cpu);
> >+		schedule_work_on(cpu, work);
> >+		cpumask_set_cpu(cpu, has_work);
> >+
> >+}
> >+#endif
> >+  
> 
> So on RT you use the local locks to perform $work which should be done
> on the remote CPU locally. RT's local locks help here. You can't do this
> in the !RT case because we optimize the local locks away so you have to
> stick with the schedule_work_on() way of dealing with it.

Exactly.

> In your description you write
> >lru_add_drain_all() works by scheduling lru_add_drain_cpu() to run
> >on all CPUs that have non-empty LRU pagevecs and then waiting for
> >the scheduled work to complete. However, workqueue threads may never
> >have the chance to run on a CPU that's running a SCHED_FIFO task.
> >This causes lru_add_drain_all() to block forever.  
> 
> It might block forever but not necessarily. By default the scheduler
> would push RT tasks off the CPU if they run for too long. So you have
> disabled this (and I don't complain about it). And once this does not
> happen, there is nothing to force the workqueue to run.
> But this is not a RT-only problem, is it? You could run high prio RT
> task on a CPU and the workqueue would never get on the CPU as well.

That's correct. This is an -RT patch for two reasons:

 1. This solution builds on top of the per-cpu locks API implemented
    by -RT and already in place in the pagevecs drain code

 2. As SCHED_FIFO is the norm in -RT, the problem is more common there
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html