On 09/24, Peter Zijlstra wrote: > > +static inline void get_online_cpus(void) > +{ > + might_sleep(); > + > + if (current->cpuhp_ref++) { > + barrier(); > + return; I don't undestand this barrier()... we are going to return if we already hold the lock, do we really need it? The same for put_online_cpus(). > +void __get_online_cpus(void) > { > - if (cpu_hotplug.active_writer == current) > + if (cpuhp_writer_task == current) > return; Probably it would be better to simply inc/dec ->cpuhp_ref in cpu_hotplug_begin/end and remove this check here and in __put_online_cpus(). This also means that the writer doing get/put_online_cpus() will always use the fast path, and __cpuhp_writer can go away, cpuhp_writer_task != NULL can be used instead. > + atomic_inc(&cpuhp_waitcount); > + > + /* > + * We either call schedule() in the wait, or we'll fall through > + * and reschedule on the preempt_enable() in get_online_cpus(). > + */ > + preempt_enable_no_resched(); > + wait_event(cpuhp_wq, !__cpuhp_writer); > + preempt_disable(); > + > + /* > + * It would be possible for cpu_hotplug_done() to complete before > + * the atomic_inc() above; in which case there is no writer waiting > + * and doing a wakeup would be BAD (tm). > + * > + * If however we still observe cpuhp_writer_task here we know > + * cpu_hotplug_done() is currently stuck waiting for cpuhp_waitcount. > + */ > + if (atomic_dec_and_test(&cpuhp_waitcount) && cpuhp_writer_task) > + cpuhp_writer_wake(); cpuhp_writer_wake() here and in __put_online_cpus() looks racy... Not only cpuhp_writer_wake() can hit cpuhp_writer_task == NULL (we need something like ACCESS_ONCE()), its task_struct can be already freed/reused if the writer exits. And I don't really understand the logic... This slow path succeds without incrementing any counter (except current->cpuhp_ref)? How the next writer can notice the fact it should wait for this reader? > void cpu_hotplug_done(void) > { > - cpu_hotplug.active_writer = NULL; > - mutex_unlock(&cpu_hotplug.lock); > + /* Signal the writer is done */ > + cpuhp_writer = 0; > + wake_up_all(&cpuhp_wq); > + > + /* Wait for any pending readers to be running */ > + cpuhp_writer_wait(!atomic_read(&cpuhp_waitcount)); > + cpuhp_writer_task = NULL; We also need to ensure that the next reader should see all changes done by the writer, iow this lacks "realease" semantics. But, Peter, the main question is, why this is better than percpu_rw_semaphore performance-wise? (Assuming we add task_struct->cpuhp_ref). If the writer is pending, percpu_down_read() does down_read(&brw->rw_sem); atomic_inc(&brw->slow_read_ctr); __up_read(&brw->rw_sem); is it really much worse than wait_event + atomic_dec_and_test? And! please note that with your implementation the new readers will be likely blocked while the writer sleeps in synchronize_sched(). This doesn't happen with percpu_rw_semaphore. Oleg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>