On Thu, Mar 10, 2016 at 06:04:06PM +0100, Andrea Arcangeli wrote: > that costs memory in the mm unless we're lucky with the slab hw > alignment), then I think synchronize_srcu may actually be preferable > than a full synchronize_sched that affects the entire system with > thousand of CPUs. A per-cpu inc wouldn't be a big deal and it would at > least avoid to stall for the whole system if a stall eventually has to > happen (unless every cpu is actually running gup_fast but that's ok in > such case). Thinking more about this, it'd be ok if the pgtable freeing srcu context was global, no need of mess with the mm. A __percpu inside mm wouldn't fly anyway. With srcu we'd wait only for those CPUs that are effectively inside gup_fast, most of the time none or a few. The main worry about synchronize_sched for x86 is that it doesn't scale as CPU number increases and there can be thousands of those. srcu has much a smaller issue as checking those per-cpu variables is almost instantaneous even if there are thousand of CPUs and while local_irq_disable may hurt in gup_fast, srcu_read_lock is unlikely to be measurable. __gup_fast would also be still ok to be called within irqs. If srcu causes problem for preempt-RT you could use synchronize_sched there and the model would remain the same. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>