Re: [PATCH] thp, mm: remove comments on serializion of THP split vs. gup_fast

Andrea Arcangeli <aarcange@xxxxxxxxxx> · Thu, 10 Mar 2016 18:22:49 +0100

On Thu, Mar 10, 2016 at 06:04:06PM +0100, Andrea Arcangeli wrote:
> that costs memory in the mm unless we're lucky with the slab hw
> alignment), then I think synchronize_srcu may actually be preferable
> than a full synchronize_sched that affects the entire system with
> thousand of CPUs. A per-cpu inc wouldn't be a big deal and it would at
> least avoid to stall for the whole system if a stall eventually has to
> happen (unless every cpu is actually running gup_fast but that's ok in
> such case).

Thinking more about this, it'd be ok if the pgtable freeing srcu
context was global, no need of mess with the mm. A __percpu inside mm
wouldn't fly anyway. With srcu we'd wait only for those CPUs that are
effectively inside gup_fast, most of the time none or a few.

The main worry about synchronize_sched for x86 is that it doesn't
scale as CPU number increases and there can be thousands of
those. srcu has much a smaller issue as checking those per-cpu
variables is almost instantaneous even if there are thousand of CPUs
and while local_irq_disable may hurt in gup_fast, srcu_read_lock is
unlikely to be measurable. __gup_fast would also be still ok to be
called within irqs. If srcu causes problem for preempt-RT you could
use synchronize_sched there and the model would remain the same.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>