Hi, This patchset removes CPU hotplug's dependence on stop_machine() from the CPU offline path and provides an alternative (set of APIs) to preempt_disable() to prevent CPUs from going offline, which can be invoked from atomic context. The motivation behind the removal of stop_machine() is to avoid its ill-effects and thus improve the design of CPU hotplug. (More description regarding this is available in the patches). All the users of preempt_disable()/local_irq_disable() who used to use it to prevent CPU offline, have been converted to the new primitives introduced in the patchset. Also, the CPU_DYING notifiers have been audited to check whether they can cope up with the removal of stop_machine() or whether they need to use new locks for synchronization (all CPU_DYING notifiers looked OK, without the need for any new locks). Applies on current mainline (v3.8-rc7+). This patchset is available in the following git branch: git://github.com/srivatsabhat/linux.git stop-machine-free-cpu-hotplug-v6 Overview of the patches: ----------------------- Patches 1 to 7 introduce a generic, flexible Per-CPU Reader-Writer Locking scheme. Patch 8 uses this synchronization mechanism to build the get/put_online_cpus_atomic() APIs which can be used from atomic context, to prevent CPUs from going offline. Patch 9 is a cleanup; it converts preprocessor macros to static inline functions. Patches 10 to 43 convert various call-sites to use the new APIs. Patch 44 is the one which actually removes stop_machine() from the CPU offline path. Patch 45 decouples stop_machine() and CPU hotplug from Kconfig. Patch 46 updates the documentation to reflect the new APIs. Changes in v6: -------------- * Fixed issues related to memory barriers, as pointed out by Paul and Oleg. * Fixed the locking issue related to clockevents_lock, which was being triggered when cpu idle was enabled. * Some code restructuring to improve readability and to enhance some fastpath optimizations. * Randconfig build-fixes, reported by Fengguang Wu. Changes in v5: -------------- Exposed a new generic locking scheme: Flexible Per-CPU Reader-Writer locks, based on the synchronization schemes already discussed in the previous versions, and used it in CPU hotplug, to implement the new APIs. Audited the CPU_DYING notifiers in the kernel source tree and replaced usages of preempt_disable() with the new get/put_online_cpus_atomic() APIs where necessary. Changes in v4: -------------- The synchronization scheme has been simplified quite a bit, which makes it look a lot less complex than before. Some highlights: * Implicit ACKs: The earlier design required the readers to explicitly ACK the writer's signal. The new design uses implicit ACKs instead. The reader switching over to rwlock implicitly tells the writer to stop waiting for that reader. * No atomic operations: Since we got rid of explicit ACKs, we no longer have the need for a reader and a writer to update the same counter. So we can get rid of atomic ops too. Changes in v3: -------------- * Dropped the _light() and _full() variants of the APIs. Provided a single interface: get/put_online_cpus_atomic(). * Completely redesigned the synchronization mechanism again, to make it fast and scalable at the reader-side in the fast-path (when no hotplug writers are active). This new scheme also ensures that there is no possibility of deadlocks due to circular locking dependency. In summary, this provides the scalability and speed of per-cpu rwlocks (without actually using them), while avoiding the downside (deadlock possibilities) which is inherent in any per-cpu locking scheme that is meant to compete with preempt_disable()/enable() in terms of flexibility. The problem with using per-cpu locking to replace preempt_disable()/enable was explained here: https://lkml.org/lkml/2012/12/6/290 Basically we use per-cpu counters (for scalability) when no writers are active, and then switch to global rwlocks (for lock-safety) when a writer becomes active. It is a slightly complex scheme, but it is based on standard principles of distributed algorithms. Changes in v2: ------------- * Completely redesigned the synchronization scheme to avoid using any extra cpumasks. * Provided APIs for 2 types of atomic hotplug readers: "light" (for light-weight) and "full". We wish to have more "light" readers than the "full" ones, to avoid indirectly inducing the "stop_machine effect" without even actually using stop_machine(). And the patches show that it _is_ generally true: 5 patches deal with "light" readers, whereas only 1 patch deals with a "full" reader. Also, the "light" readers happen to be in very hot paths. So it makes a lot of sense to have such a distinction and a corresponding light-weight API. Links to previous versions: v5: http://lwn.net/Articles/533553/ v4: https://lkml.org/lkml/2012/12/11/209 v3: https://lkml.org/lkml/2012/12/7/287 v2: https://lkml.org/lkml/2012/12/5/322 v1: https://lkml.org/lkml/2012/12/4/88 -- Paul E. McKenney (1): cpu: No more __stop_machine() in _cpu_down() Srivatsa S. Bhat (45): percpu_rwlock: Introduce the global reader-writer lock backend percpu_rwlock: Introduce per-CPU variables for the reader and the writer percpu_rwlock: Provide a way to define and init percpu-rwlocks at compile time percpu_rwlock: Implement the core design of Per-CPU Reader-Writer Locks percpu_rwlock: Make percpu-rwlocks IRQ-safe, optimally percpu_rwlock: Rearrange the read-lock code to fastpath nested percpu readers percpu_rwlock: Allow writers to be readers, and add lockdep annotations CPU hotplug: Provide APIs to prevent CPU offline from atomic context CPU hotplug: Convert preprocessor macros to static inline functions smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly sched/timer: Use get/put_online_cpus_atomic() to prevent CPU offline sched/migration: Use raw_spin_lock/unlock since interrupts are already disabled sched/rt: Use get/put_online_cpus_atomic() to prevent CPU offline tick: Use get/put_online_cpus_atomic() to prevent CPU offline time/clocksource: Use get/put_online_cpus_atomic() to prevent CPU offline clockevents: Use get/put_online_cpus_atomic() in clockevents_notify() softirq: Use get/put_online_cpus_atomic() to prevent CPU offline irq: Use get/put_online_cpus_atomic() to prevent CPU offline net: Use get/put_online_cpus_atomic() to prevent CPU offline block: Use get/put_online_cpus_atomic() to prevent CPU offline crypto: pcrypt - Protect access to cpu_online_mask with get/put_online_cpus() infiniband: ehca: Use get/put_online_cpus_atomic() to prevent CPU offline [SCSI] fcoe: Use get/put_online_cpus_atomic() to prevent CPU offline staging: octeon: Use get/put_online_cpus_atomic() to prevent CPU offline x86: Use get/put_online_cpus_atomic() to prevent CPU offline perf/x86: Use get/put_online_cpus_atomic() to prevent CPU offline KVM: Use get/put_online_cpus_atomic() to prevent CPU offline from atomic context kvm/vmx: Use get/put_online_cpus_atomic() to prevent CPU offline x86/xen: Use get/put_online_cpus_atomic() to prevent CPU offline alpha/smp: Use get/put_online_cpus_atomic() to prevent CPU offline blackfin/smp: Use get/put_online_cpus_atomic() to prevent CPU offline cris/smp: Use get/put_online_cpus_atomic() to prevent CPU offline hexagon/smp: Use get/put_online_cpus_atomic() to prevent CPU offline ia64: Use get/put_online_cpus_atomic() to prevent CPU offline m32r: Use get/put_online_cpus_atomic() to prevent CPU offline MIPS: Use get/put_online_cpus_atomic() to prevent CPU offline mn10300: Use get/put_online_cpus_atomic() to prevent CPU offline parisc: Use get/put_online_cpus_atomic() to prevent CPU offline powerpc: Use get/put_online_cpus_atomic() to prevent CPU offline sh: Use get/put_online_cpus_atomic() to prevent CPU offline sparc: Use get/put_online_cpus_atomic() to prevent CPU offline tile: Use get/put_online_cpus_atomic() to prevent CPU offline CPU hotplug, stop_machine: Decouple CPU hotplug from stop_machine() in Kconfig Documentation/cpu-hotplug: Remove references to stop_machine() Documentation/cpu-hotplug.txt | 17 +- arch/alpha/kernel/smp.c | 19 +- arch/arm/Kconfig | 1 arch/blackfin/Kconfig | 1 arch/blackfin/mach-common/smp.c | 6 - arch/cris/arch-v32/kernel/smp.c | 8 + arch/hexagon/kernel/smp.c | 5 arch/ia64/Kconfig | 1 arch/ia64/kernel/irq_ia64.c | 13 + arch/ia64/kernel/perfmon.c | 6 + arch/ia64/kernel/smp.c | 23 ++ arch/ia64/mm/tlb.c | 6 - arch/m32r/kernel/smp.c | 12 + arch/mips/Kconfig | 1 arch/mips/kernel/cevt-smtc.c | 8 + arch/mips/kernel/smp.c | 16 +- arch/mips/kernel/smtc.c | 3 arch/mips/mm/c-octeon.c | 4 arch/mn10300/Kconfig | 1 arch/mn10300/kernel/smp.c | 2 arch/mn10300/mm/cache-smp.c | 5 arch/mn10300/mm/tlb-smp.c | 15 + arch/parisc/Kconfig | 1 arch/parisc/kernel/smp.c | 4 arch/powerpc/Kconfig | 1 arch/powerpc/mm/mmu_context_nohash.c | 2 arch/s390/Kconfig | 1 arch/sh/Kconfig | 1 arch/sh/kernel/smp.c | 12 + arch/sparc/Kconfig | 1 arch/sparc/kernel/leon_smp.c | 2 arch/sparc/kernel/smp_64.c | 9 - arch/sparc/kernel/sun4d_smp.c | 2 arch/sparc/kernel/sun4m_smp.c | 3 arch/tile/kernel/smp.c | 4 arch/x86/Kconfig | 1 arch/x86/include/asm/ipi.h | 5 arch/x86/kernel/apic/apic_flat_64.c | 10 + arch/x86/kernel/apic/apic_numachip.c | 5 arch/x86/kernel/apic/es7000_32.c | 5 arch/x86/kernel/apic/io_apic.c | 7 - arch/x86/kernel/apic/ipi.c | 10 + arch/x86/kernel/apic/x2apic_cluster.c | 4 arch/x86/kernel/apic/x2apic_uv_x.c | 4 arch/x86/kernel/cpu/mcheck/therm_throt.c | 4 arch/x86/kernel/cpu/perf_event_intel_uncore.c | 5 arch/x86/kvm/vmx.c | 8 + arch/x86/mm/tlb.c | 14 + arch/x86/xen/mmu.c | 11 + arch/x86/xen/smp.c | 9 + block/blk-softirq.c | 4 crypto/pcrypt.c | 4 drivers/infiniband/hw/ehca/ehca_irq.c | 8 + drivers/scsi/fcoe/fcoe.c | 7 + drivers/staging/octeon/ethernet-rx.c | 3 include/linux/cpu.h | 8 + include/linux/percpu-rwlock.h | 74 +++++++ include/linux/stop_machine.h | 2 init/Kconfig | 2 kernel/cpu.c | 59 +++++- kernel/irq/manage.c | 7 + kernel/sched/core.c | 36 +++- kernel/sched/fair.c | 5 kernel/sched/rt.c | 3 kernel/smp.c | 65 ++++-- kernel/softirq.c | 3 kernel/time/clockevents.c | 3 kernel/time/clocksource.c | 5 kernel/time/tick-broadcast.c | 2 kernel/timer.c | 2 lib/Kconfig | 3 lib/Makefile | 1 lib/percpu-rwlock.c | 256 +++++++++++++++++++++++++ net/core/dev.c | 9 + virt/kvm/kvm_main.c | 10 + 75 files changed, 776 insertions(+), 123 deletions(-) create mode 100644 include/linux/percpu-rwlock.h create mode 100644 lib/percpu-rwlock.c Regards, Srivatsa S. Bhat IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html