On 10/02, Peter Zijlstra wrote: > > On Wed, Oct 02, 2013 at 02:13:56PM +0200, Oleg Nesterov wrote: > > In short: unless a gp elapses between _exit() and _enter(), the next > > _enter() does nothing and avoids synchronize_sched(). > > That does however make the entire scheme entirely writer biased; Well, this makes the scheme "a bit more" writer biased, but this is exactly what we want in this case. We do not block the readers after xxx_exit() entirely, but we do want to keep them in SLOW state and avoid the costly SLOW -> FAST -> SLOW transitions. Lets even forget about disable_nonboot_cpus(), lets consider percpu_rwsem-like logic "in general". Yes, it is heavily optimizied for readers. But if the writers come in a batch, or the same writer does down_write + up_write twice or more, I think state == FAST is pointless in between (if we can avoid it). This is the rare case (the writers should be rare), but if it happens it makes sense to optimize the writers too. And again, even for (;;) { percpu_down_write(); percpu_up_write(); } should not completely block the readers. IOW. "turn sync_sched() into call_rcu_sched() in up_write()" is obviously a win. If the next down_write/xxx_enter "knows" that the readers are still in SLOW mode because gp was not completed yet, why should we add the artificial delay? As for disable_nonboot_cpus(). You are going to move cpu_hotplug_begin() outside of the loop, this is the same thing. Oleg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>