Hi, We have two models of preemption: voluntary and full (and RT which is a fuller form of full preemption.) In this series -- which is based on Thomas' PoC (see [1]), we try to unify the two by letting the scheduler enforce policy for the voluntary preemption models as well. (Note that this is about preemption when executing in the kernel. Userspace is always preemptible.) Background == Why?: both of these preemption mechanisms are almost entirely disjoint. There are four main sets of preemption points in the kernel: 1. return to user 2. explicit preemption points (cond_resched() and its ilk) 3. return to kernel (tick/IPI/irq at irqexit) 4. end of non-preemptible sections at (preempt_count() == preempt_offset) Voluntary preemption uses mechanisms 1 and 2. Full preemption uses 1, 3 and 4. In addition both use cond_resched_{rcu,lock,rwlock*} which can be all things to all people because they internally contain 2, and 4. Now since there's no ideal placement of explicit preemption points, they tend to be randomly spread over code and accumulate over time, as they are are added when latency problems are seen. Plus fear of regressions makes them difficult to remove. (Presumably, asymptotically they would spead out evenly across the instruction stream!) In voluntary models, the scheduler's job is to match the demand side of preemption points (a task that needs to be scheduled) with the supply side (a task which calls cond_resched().) Full preemption models track preemption count so the scheduler can always knows if it is safe to preempt and can drive preemption itself (ex. via dynamic preemption points in 3.) Design == As Thomas outlines in [1], to unify the preemption models we want to: always have the preempt_count enabled and allow the scheduler to drive preemption policy based on the model in effect. Policies: - preemption=none: run to completion - preemption=voluntary: run to completion, unless a task of higher sched-class awaits - preemption=full: optimized for low-latency. Preempt whenever a higher priority task awaits. To do this add a new flag, TIF_NEED_RESCHED_LAZY which allows the scheduler to mark that a reschedule is needed, but is deferred until the task finishes executing in the kernel -- voluntary preemption as it were. The TIF_NEED_RESCHED flag is evaluated at all three of the preemption points. TIF_NEED_RESCHED_LAZY only needs to be evaluated at ret-to-user. ret-to-user ret-to-kernel preempt_count() none Y N N voluntary Y Y Y full Y Y Y There's just one remaining issue: now that explicit preemption points are gone, processes that spread a long time in the kernel have no way to give up the CPU. For full preemption, that is a non-issue as we always use TIF_NEED_RESCHED. For none/voluntary preemption, we handle that by upgrading to TIF_NEED_RESCHED if a task marked TIF_NEED_RESCHED_LAZY hasn't preempted away by the next tick. (This would cause preemption either at ret-to-kernel, or if the task is in a non-preemptible section, when it exits that section.) Arguably this provides for much more consistent maximum latency (~2 tick lengths + length of non-preemptible section) as compared to the old model where the maximum latency depended on the dynamic distribution of cond_resched() points. (As a bonus it handles code that is preemptible but cannot call cond_resched() completely trivially: ex. long running Xen hypercalls, or this series which started this discussion: https://lore.kernel.org/all/20230830184958.2333078-8-ankur.a.arora@xxxxxxxxxx/) Status == What works: - The system seems to keep ticking over with the normal scheduling policies (SCHED_OTHER). The support for the realtime policies is somewhat more half baked.) - The basic performance numbers seem pretty close to 6.6-rc7 baseline What's broken: - ARCH_NO_PREEMPT (See patch-45 "preempt: ARCH_NO_PREEMPT only preempts lazily") - Non-x86 architectures. It's trivial to support other archs (only need to add TIF_NEED_RESCHED_LAZY) but wanted to hold off until I got some comments on the series. (From some testing on arm64, didn't find any surprises.) - livepatch: livepatch depends on using _cond_resched() to provide low-latency patching. That is obviously difficult with cond_resched() gone. We could get a similar effect by using a static_key in preempt_enable() but at least with inline locks, that might be end up bloating the kernel quite a bit. - Documentation/ and comments mention cond_resched() - ftrace support for need-resched-lazy is incomplete What needs more discussion: - Should cond_resched_lock() etc be scheduling out for TIF_NEED_RESCHED only or both TIF_NEED_RESCHED_LAZY as well? (See patch 35 "thread_info: change to tif_need_resched(resched_t)") - Tracking whether a task in userspace or in the kernel (See patch-40 "context_tracking: add ct_state_cpu()") - The right model for preempt=voluntary. (See patch 44 "sched: voluntary preemption") Performance == Expectation: * perf sched bench pipe preemption full none 6.6-rc7 6.68 +- 0.10 6.69 +- 0.07 +series 6.69 +- 0.12 6.67 +- 0.10 This is rescheduling out of idle which should and does perform identically. * schbench, preempt=none * 1 group, 16 threads each 6.6-rc7 +series (usecs) (usecs) 50.0th: 6 6 90.0th: 8 7 99.0th: 11 11 99.9th: 15 14 * 8 groups, 16 threads each 6.6-rc7 +series (usecs) (usecs) 50.0th: 6 6 90.0th: 8 8 99.0th: 12 11 99.9th: 20 21 * schbench, preempt=full * 1 group, 16 threads each 6.6-rc7 +series (usecs) (usecs) 50.0th: 6 6 90.0th: 8 7 99.0th: 11 11 99.9th: 14 14 * 8 groups, 16 threads each 6.6-rc7 +series (usecs) (usecs) 50.0th: 7 7 90.0th: 9 9 99.0th: 12 12 99.9th: 21 22 Not much in it either way. * kernbench, preempt=full * half-load (-j 128) 6.6-rc7 +series wall 149.2 +- 27.2 wall 132.8 +- 0.4 utime 8097.1 +- 57.4 utime 8088.5 +- 14.1 stime 1165.5 +- 9.4 stime 1159.2 +- 1.9 %cpu 6337.6 +- 1072.8 %cpu 6959.6 +- 22.8 csw 237618 +- 2190.6 %csw 240343 +- 1386.8 * optimal-load (-j 1024) 6.6-rc7 +series wall 137.8 +- 0.0 wall 137.7 +- 0.8 utime 11115.0 +- 3306.1 utime 11041.7 +- 3235.0 stime 1340.0 +- 191.3 stime 1323.1 +- 179.5 %cpu 8846.3 +- 2830.6 %cpu 9101.3 +- 2346.7 csw 2099910 +- 2040080.0 csw 2068210 +- 2002450.0 The preempt=full path should effectively not see any change in behaviour. The optimal-loads are pretty much identical. For the half-load, however, the +series version does much better but that seems to be because of much higher run to run variability in the 6.6-rc7 load. * kernbench, preempt=none * half-load (-j 128) 6.6-rc7 +series wall 134.5 +- 4.2 wall 133.6 +- 2.7 utime 8093.3 +- 39.3 utime 8099.0 +- 38.9 stime 1175.7 +- 10.6 stime 1169.1 +- 8.4 %cpu 6893.3 +- 233.2 %cpu 6936.3 +- 142.8 csw 240723 +- 423.0 %csw 173152 +- 1126.8 * optimal-load (-j 1024) 6.6-rc7 +series wall 139.2 +- 0.3 wall 138.8 +- 0.2 utime 11161.0 +- 3360.4 utime 11061.2 +- 3244.9 stime 1357.6 +- 199.3 stime 1366.6 +- 216.3 %cpu 9108.8 +- 2431.4 %cpu 9081.0 +- 2351.1 csw 2078599 +- 2013320.0 csw 1970610 +- 1969030.0 For both of these the wallclock, utime, stime etc are pretty much identical. The one interesting difference is that the number of context switches are fewer. This intuitively makes sense given that we reschedule threads lazily rather than rescheduling if we encounter a cond_resched() and there's a thread wanting to be scheduled. The max-load numbers (not posted here) also behave similarly. Series == With that, this is how he series is laid out: - Patches 01-30: revert the PREEMPT_DYNAMIC code. Most of the infrastructure used by that is via static_calls() and this is a simpler approach which doesn't need any of that (and does away with cond_resched().) Some of the commits will be resurrected. 089c02ae2771 ("ftrace: Use preemption model accessors for trace header printout") cfe43f478b79 ("preempt/dynamic: Introduce preemption model accessors") 5693fa74f98a ("kcsan: Use preemption model accessors") - Patches 31-45: contain the scheduler changes to do this. Of these the critical ones are: patch 35 "thread_info: change to tif_need_resched(resched_t)" patch 41 "sched: handle resched policy in resched_curr()" patch 43 "sched: enable PREEMPT_COUNT, PREEMPTION for all preemption models" patch 44 "sched: voluntary preemption" (this needs more work to decide when a higher sched-policy task should preempt a lower sched-policy task) patch 45 "preempt: ARCH_NO_PREEMPT only preempts lazily" - Patches 47-50: contain RCU related changes. RCU now works in both PREEMPT_RCU=y and PREEMPT_RCU=n modes with CONFIG_PREEMPTION. (Until now PREEMPTION=y => PREEMPT_RCU) - Patches 51-56,86: contain cond_resched() related cleanups. patch 54 "sched: add cond_resched_stall()" adds a new cond_resched() interface. Pitchforks? - Patches 57-86: remove cond_resched() from the tree. Also at: github.com/terminus/linux preemption-rfc Please review. Thanks Ankur [1] https://lore.kernel.org/lkml/87jzshhexi.ffs@tglx/ Ankur Arora (86): Revert "riscv: support PREEMPT_DYNAMIC with static keys" Revert "sched/core: Make sched_dynamic_mutex static" Revert "ftrace: Use preemption model accessors for trace header printout" Revert "preempt/dynamic: Introduce preemption model accessors" Revert "kcsan: Use preemption model accessors" Revert "entry: Fix compile error in dynamic_irqentry_exit_cond_resched()" Revert "livepatch,sched: Add livepatch task switching to cond_resched()" Revert "arm64: Support PREEMPT_DYNAMIC" Revert "sched/preempt: Add PREEMPT_DYNAMIC using static keys" Revert "sched/preempt: Decouple HAVE_PREEMPT_DYNAMIC from GENERIC_ENTRY" Revert "sched/preempt: Simplify irqentry_exit_cond_resched() callers" Revert "sched/preempt: Refactor sched_dynamic_update()" Revert "sched/preempt: Move PREEMPT_DYNAMIC logic later" Revert "preempt/dynamic: Fix setup_preempt_mode() return value" Revert "preempt: Restore preemption model selection configs" Revert "sched: Provide Kconfig support for default dynamic preempt mode" sched/preempt: remove PREEMPT_DYNAMIC from the build version Revert "preempt/dynamic: Fix typo in macro conditional statement" Revert "sched,preempt: Move preempt_dynamic to debug.c" Revert "static_call: Relax static_call_update() function argument type" Revert "sched/core: Use -EINVAL in sched_dynamic_mode()" Revert "sched/core: Stop using magic values in sched_dynamic_mode()" Revert "sched,x86: Allow !PREEMPT_DYNAMIC" Revert "sched: Harden PREEMPT_DYNAMIC" Revert "sched: Add /debug/sched_preempt" Revert "preempt/dynamic: Support dynamic preempt with preempt= boot option" Revert "preempt/dynamic: Provide irqentry_exit_cond_resched() static call" Revert "preempt/dynamic: Provide preempt_schedule[_notrace]() static calls" Revert "preempt/dynamic: Provide cond_resched() and might_resched() static calls" Revert "preempt: Introduce CONFIG_PREEMPT_DYNAMIC" x86/thread_info: add TIF_NEED_RESCHED_LAZY entry: handle TIF_NEED_RESCHED_LAZY entry/kvm: handle TIF_NEED_RESCHED_LAZY thread_info: accessors for TIF_NEED_RESCHED* thread_info: change to tif_need_resched(resched_t) entry: irqentry_exit only preempts TIF_NEED_RESCHED sched: make test_*_tsk_thread_flag() return bool sched: *_tsk_need_resched() now takes resched_t sched: handle lazy resched in set_nr_*_polling() context_tracking: add ct_state_cpu() sched: handle resched policy in resched_curr() sched: force preemption on tick expiration sched: enable PREEMPT_COUNT, PREEMPTION for all preemption models sched: voluntary preemption preempt: ARCH_NO_PREEMPT only preempts lazily tracing: handle lazy resched rcu: select PREEMPT_RCU if PREEMPT rcu: handle quiescent states for PREEMPT_RCU=n osnoise: handle quiescent states directly rcu: TASKS_RCU does not need to depend on PREEMPTION preempt: disallow !PREEMPT_COUNT or !PREEMPTION sched: remove CONFIG_PREEMPTION from *_needbreak() sched: fixup __cond_resched_*() sched: add cond_resched_stall() xarray: add cond_resched_xas_rcu() and cond_resched_xas_lock_irq() xarray: use cond_resched_xas*() coccinelle: script to remove cond_resched() treewide: x86: remove cond_resched() treewide: rcu: remove cond_resched() treewide: torture: remove cond_resched() treewide: bpf: remove cond_resched() treewide: trace: remove cond_resched() treewide: futex: remove cond_resched() treewide: printk: remove cond_resched() treewide: task_work: remove cond_resched() treewide: kernel: remove cond_resched() treewide: kernel: remove cond_reshed() treewide: mm: remove cond_resched() treewide: io_uring: remove cond_resched() treewide: ipc: remove cond_resched() treewide: lib: remove cond_resched() treewide: crypto: remove cond_resched() treewide: security: remove cond_resched() treewide: fs: remove cond_resched() treewide: virt: remove cond_resched() treewide: block: remove cond_resched() treewide: netfilter: remove cond_resched() treewide: net: remove cond_resched() treewide: net: remove cond_resched() treewide: sound: remove cond_resched() treewide: md: remove cond_resched() treewide: mtd: remove cond_resched() treewide: drm: remove cond_resched() treewide: net: remove cond_resched() treewide: drivers: remove cond_resched() sched: remove cond_resched() .../admin-guide/kernel-parameters.txt | 7 - arch/Kconfig | 42 +- arch/arm64/Kconfig | 1 - arch/arm64/include/asm/preempt.h | 19 +- arch/arm64/kernel/entry-common.c | 10 +- arch/riscv/Kconfig | 1 - arch/s390/include/asm/preempt.h | 4 +- arch/x86/Kconfig | 1 - arch/x86/include/asm/preempt.h | 50 +- arch/x86/include/asm/thread_info.h | 6 +- arch/x86/kernel/alternative.c | 10 - arch/x86/kernel/cpu/sgx/encl.c | 14 +- arch/x86/kernel/cpu/sgx/ioctl.c | 3 - arch/x86/kernel/cpu/sgx/main.c | 5 - arch/x86/kernel/cpu/sgx/virt.c | 4 - arch/x86/kvm/lapic.c | 6 +- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/svm/sev.c | 5 +- arch/x86/net/bpf_jit_comp.c | 1 - arch/x86/net/bpf_jit_comp32.c | 1 - arch/x86/xen/mmu_pv.c | 1 - block/blk-cgroup.c | 2 - block/blk-lib.c | 11 - block/blk-mq.c | 3 - block/blk-zoned.c | 6 - crypto/internal.h | 2 +- crypto/tcrypt.c | 5 - crypto/testmgr.c | 10 - drivers/accel/ivpu/ivpu_drv.c | 2 - drivers/accel/ivpu/ivpu_gem.c | 1 - drivers/accel/ivpu/ivpu_pm.c | 8 +- drivers/accel/qaic/qaic_data.c | 2 - drivers/acpi/processor_idle.c | 2 +- drivers/auxdisplay/charlcd.c | 11 - drivers/base/power/domain.c | 1 - drivers/block/aoe/aoecmd.c | 3 +- drivers/block/brd.c | 1 - drivers/block/drbd/drbd_bitmap.c | 4 - drivers/block/drbd/drbd_debugfs.c | 1 - drivers/block/loop.c | 3 - drivers/block/xen-blkback/blkback.c | 3 - drivers/block/zram/zram_drv.c | 2 - drivers/bluetooth/virtio_bt.c | 1 - drivers/char/hw_random/arm_smccc_trng.c | 1 - drivers/char/lp.c | 2 - drivers/char/mem.c | 4 - drivers/char/mwave/3780i.c | 4 +- drivers/char/ppdev.c | 4 - drivers/char/random.c | 2 - drivers/char/virtio_console.c | 1 - drivers/crypto/virtio/virtio_crypto_core.c | 1 - drivers/cxl/pci.c | 1 - drivers/dma-buf/selftest.c | 1 - drivers/dma-buf/st-dma-fence-chain.c | 1 - drivers/fsi/fsi-sbefifo.c | 14 +- drivers/gpu/drm/bridge/samsung-dsim.c | 2 +- drivers/gpu/drm/drm_buddy.c | 1 - drivers/gpu/drm/drm_gem.c | 1 - .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_object.c | 1 - drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 2 - .../gpu/drm/i915/gem/selftests/huge_pages.c | 6 - .../drm/i915/gem/selftests/i915_gem_mman.c | 5 - drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 2 +- drivers/gpu/drm/i915/gt/intel_gt.c | 2 +- drivers/gpu/drm/i915/gt/intel_migrate.c | 4 - drivers/gpu/drm/i915/gt/selftest_execlists.c | 4 - drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 2 - drivers/gpu/drm/i915/gt/selftest_lrc.c | 2 - drivers/gpu/drm/i915/gt/selftest_migrate.c | 2 - drivers/gpu/drm/i915/gt/selftest_timeline.c | 4 - drivers/gpu/drm/i915/i915_active.c | 2 +- drivers/gpu/drm/i915/i915_gem_evict.c | 2 - drivers/gpu/drm/i915/i915_gpu_error.c | 18 +- drivers/gpu/drm/i915/intel_uncore.c | 1 - drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 2 - drivers/gpu/drm/i915/selftests/i915_request.c | 2 - .../gpu/drm/i915/selftests/i915_selftest.c | 3 - drivers/gpu/drm/i915/selftests/i915_vma.c | 9 - .../gpu/drm/i915/selftests/igt_flush_test.c | 2 - .../drm/i915/selftests/intel_memory_region.c | 4 - drivers/gpu/drm/tests/drm_buddy_test.c | 5 - drivers/gpu/drm/tests/drm_mm_test.c | 29 - drivers/i2c/busses/i2c-bcm-iproc.c | 9 +- drivers/i2c/busses/i2c-highlander.c | 9 +- drivers/i2c/busses/i2c-ibm_iic.c | 11 +- drivers/i2c/busses/i2c-mpc.c | 2 +- drivers/i2c/busses/i2c-mxs.c | 9 +- drivers/i2c/busses/scx200_acb.c | 9 +- drivers/infiniband/core/umem.c | 1 - drivers/infiniband/hw/hfi1/driver.c | 1 - drivers/infiniband/hw/hfi1/firmware.c | 2 +- drivers/infiniband/hw/hfi1/init.c | 1 - drivers/infiniband/hw/hfi1/ruc.c | 1 - drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 5 +- drivers/infiniband/hw/qib/qib_init.c | 1 - drivers/infiniband/sw/rxe/rxe_qp.c | 3 +- drivers/infiniband/sw/rxe/rxe_task.c | 4 +- drivers/input/evdev.c | 1 - drivers/input/keyboard/clps711x-keypad.c | 2 +- drivers/input/misc/uinput.c | 1 - drivers/input/mousedev.c | 1 - drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 - drivers/md/bcache/btree.c | 5 - drivers/md/bcache/journal.c | 2 - drivers/md/bcache/sysfs.c | 1 - drivers/md/bcache/writeback.c | 2 - drivers/md/dm-bufio.c | 14 - drivers/md/dm-cache-target.c | 4 - drivers/md/dm-crypt.c | 3 - drivers/md/dm-integrity.c | 3 - drivers/md/dm-kcopyd.c | 2 - drivers/md/dm-snap.c | 1 - drivers/md/dm-stats.c | 8 - drivers/md/dm-thin.c | 2 - drivers/md/dm-writecache.c | 11 - drivers/md/dm.c | 4 - drivers/md/md.c | 1 - drivers/md/raid1.c | 2 - drivers/md/raid10.c | 3 - drivers/md/raid5.c | 2 - drivers/media/i2c/vpx3220.c | 3 - drivers/media/pci/cobalt/cobalt-i2c.c | 4 +- drivers/misc/bcm-vk/bcm_vk_dev.c | 3 +- drivers/misc/bcm-vk/bcm_vk_msg.c | 3 +- drivers/misc/genwqe/card_base.c | 3 +- drivers/misc/genwqe/card_ddcb.c | 6 - drivers/misc/genwqe/card_dev.c | 2 - drivers/misc/vmw_balloon.c | 4 - drivers/mmc/host/mmc_spi.c | 3 - drivers/mtd/chips/cfi_cmdset_0001.c | 6 - drivers/mtd/chips/cfi_cmdset_0002.c | 1 - drivers/mtd/chips/cfi_util.c | 2 +- drivers/mtd/devices/spear_smi.c | 2 +- drivers/mtd/devices/sst25l.c | 3 +- drivers/mtd/devices/st_spi_fsm.c | 4 - drivers/mtd/inftlcore.c | 5 - drivers/mtd/lpddr/lpddr_cmds.c | 6 +- drivers/mtd/mtd_blkdevs.c | 1 - drivers/mtd/nand/onenand/onenand_base.c | 18 +- drivers/mtd/nand/onenand/onenand_samsung.c | 8 +- drivers/mtd/nand/raw/diskonchip.c | 4 +- drivers/mtd/nand/raw/fsmc_nand.c | 3 +- drivers/mtd/nand/raw/hisi504_nand.c | 2 +- drivers/mtd/nand/raw/nand_base.c | 3 +- drivers/mtd/nand/raw/nand_legacy.c | 17 +- drivers/mtd/spi-nor/core.c | 8 +- drivers/mtd/tests/mtd_test.c | 2 - drivers/mtd/tests/mtd_test.h | 2 +- drivers/mtd/tests/pagetest.c | 1 - drivers/mtd/tests/readtest.c | 2 - drivers/mtd/tests/torturetest.c | 1 - drivers/mtd/ubi/attach.c | 10 - drivers/mtd/ubi/build.c | 2 - drivers/mtd/ubi/cdev.c | 4 - drivers/mtd/ubi/eba.c | 8 - drivers/mtd/ubi/misc.c | 2 - drivers/mtd/ubi/vtbl.c | 6 - drivers/mtd/ubi/wl.c | 13 - drivers/net/dummy.c | 1 - drivers/net/ethernet/broadcom/tg3.c | 2 +- drivers/net/ethernet/intel/e1000/e1000_hw.c | 3 - drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 +- drivers/net/ethernet/mellanox/mlx4/catas.c | 2 +- drivers/net/ethernet/mellanox/mlx4/cmd.c | 13 +- .../ethernet/mellanox/mlx4/resource_tracker.c | 9 +- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 3 +- drivers/net/ethernet/mellanox/mlxsw/i2c.c | 5 - drivers/net/ethernet/mellanox/mlxsw/pci.c | 2 - drivers/net/ethernet/pasemi/pasemi_mac.c | 3 - .../ethernet/qlogic/netxen/netxen_nic_init.c | 2 - .../ethernet/qlogic/qlcnic/qlcnic_83xx_init.c | 1 - .../net/ethernet/qlogic/qlcnic/qlcnic_init.c | 1 - .../ethernet/qlogic/qlcnic/qlcnic_minidump.c | 2 - drivers/net/ethernet/sfc/falcon/falcon.c | 6 - drivers/net/ifb.c | 1 - drivers/net/ipvlan/ipvlan_core.c | 1 - drivers/net/macvlan.c | 2 - drivers/net/mhi_net.c | 4 +- drivers/net/netdevsim/fib.c | 1 - drivers/net/virtio_net.c | 2 - drivers/net/wireguard/ratelimiter.c | 2 - drivers/net/wireguard/receive.c | 3 - drivers/net/wireguard/send.c | 4 - drivers/net/wireless/broadcom/b43/lo.c | 6 +- drivers/net/wireless/broadcom/b43/pio.c | 1 - drivers/net/wireless/broadcom/b43legacy/phy.c | 5 - .../broadcom/brcm80211/brcmfmac/cfg80211.c | 1 - drivers/net/wireless/cisco/airo.c | 2 - .../net/wireless/intel/iwlwifi/pcie/trans.c | 2 - drivers/net/wireless/marvell/mwl8k.c | 2 - drivers/net/wireless/mediatek/mt76/util.c | 1 - drivers/net/wwan/mhi_wwan_mbim.c | 2 +- drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c | 3 - drivers/net/xen-netback/netback.c | 1 - drivers/net/xen-netback/rx.c | 2 - drivers/nvdimm/btt.c | 2 - drivers/nvme/target/zns.c | 2 - drivers/parport/parport_ip32.c | 1 - drivers/parport/parport_pc.c | 4 - drivers/pci/pci-sysfs.c | 1 - drivers/pci/proc.c | 1 - .../intel/speed_select_if/isst_if_mbox_pci.c | 4 +- drivers/s390/cio/css.c | 8 - drivers/scsi/NCR5380.c | 2 - drivers/scsi/megaraid.c | 1 - drivers/scsi/qedi/qedi_main.c | 1 - drivers/scsi/qla2xxx/qla_nx.c | 2 - drivers/scsi/qla2xxx/qla_sup.c | 5 - drivers/scsi/qla4xxx/ql4_nx.c | 1 - drivers/scsi/xen-scsifront.c | 2 +- drivers/spi/spi-lantiq-ssc.c | 3 +- drivers/spi/spi-meson-spifc.c | 2 +- drivers/spi/spi.c | 2 +- drivers/staging/rtl8723bs/core/rtw_mlme_ext.c | 2 +- drivers/staging/rtl8723bs/core/rtw_pwrctrl.c | 2 - drivers/tee/optee/ffa_abi.c | 1 - drivers/tee/optee/smc_abi.c | 1 - drivers/tty/hvc/hvc_console.c | 6 +- drivers/tty/tty_buffer.c | 3 - drivers/tty/tty_io.c | 1 - drivers/usb/gadget/udc/max3420_udc.c | 1 - drivers/usb/host/max3421-hcd.c | 2 +- drivers/usb/host/xen-hcd.c | 2 +- drivers/vfio/vfio_iommu_spapr_tce.c | 2 - drivers/vfio/vfio_iommu_type1.c | 7 - drivers/vhost/vhost.c | 1 - drivers/video/console/vgacon.c | 4 - drivers/virtio/virtio_mem.c | 8 - drivers/xen/balloon.c | 2 - drivers/xen/gntdev.c | 2 - drivers/xen/xen-scsiback.c | 9 +- fs/afs/write.c | 2 - fs/btrfs/backref.c | 6 - fs/btrfs/block-group.c | 3 - fs/btrfs/ctree.c | 1 - fs/btrfs/defrag.c | 1 - fs/btrfs/disk-io.c | 3 - fs/btrfs/extent-io-tree.c | 5 - fs/btrfs/extent-tree.c | 8 - fs/btrfs/extent_io.c | 9 - fs/btrfs/file-item.c | 1 - fs/btrfs/file.c | 4 - fs/btrfs/free-space-cache.c | 4 - fs/btrfs/inode.c | 9 - fs/btrfs/ordered-data.c | 2 - fs/btrfs/qgroup.c | 1 - fs/btrfs/reflink.c | 2 - fs/btrfs/relocation.c | 9 - fs/btrfs/scrub.c | 3 - fs/btrfs/send.c | 1 - fs/btrfs/space-info.c | 1 - fs/btrfs/tests/extent-io-tests.c | 1 - fs/btrfs/transaction.c | 3 - fs/btrfs/tree-log.c | 12 - fs/btrfs/uuid-tree.c | 1 - fs/btrfs/volumes.c | 2 - fs/buffer.c | 1 - fs/cachefiles/cache.c | 4 +- fs/cachefiles/namei.c | 1 - fs/cachefiles/volume.c | 1 - fs/ceph/addr.c | 1 - fs/dax.c | 16 +- fs/dcache.c | 2 - fs/dlm/ast.c | 1 - fs/dlm/dir.c | 2 - fs/dlm/lock.c | 3 - fs/dlm/lowcomms.c | 3 - fs/dlm/recover.c | 1 - fs/drop_caches.c | 1 - fs/erofs/utils.c | 1 - fs/erofs/zdata.c | 8 +- fs/eventpoll.c | 3 - fs/exec.c | 4 - fs/ext4/block_validity.c | 2 - fs/ext4/dir.c | 1 - fs/ext4/extents.c | 1 - fs/ext4/ialloc.c | 1 - fs/ext4/inode.c | 1 - fs/ext4/mballoc.c | 12 +- fs/ext4/namei.c | 3 - fs/ext4/orphan.c | 1 - fs/ext4/super.c | 2 - fs/f2fs/checkpoint.c | 16 +- fs/f2fs/compress.c | 1 - fs/f2fs/data.c | 3 - fs/f2fs/dir.c | 1 - fs/f2fs/extent_cache.c | 1 - fs/f2fs/f2fs.h | 6 +- fs/f2fs/file.c | 3 - fs/f2fs/node.c | 4 - fs/f2fs/super.c | 1 - fs/fat/fatent.c | 2 - fs/file.c | 7 +- fs/fs-writeback.c | 3 - fs/gfs2/aops.c | 1 - fs/gfs2/bmap.c | 2 - fs/gfs2/glock.c | 2 +- fs/gfs2/log.c | 1 - fs/gfs2/ops_fstype.c | 1 - fs/hpfs/buffer.c | 8 - fs/hugetlbfs/inode.c | 3 - fs/inode.c | 3 - fs/iomap/buffered-io.c | 7 +- fs/jbd2/checkpoint.c | 2 - fs/jbd2/commit.c | 3 - fs/jbd2/recovery.c | 2 - fs/jffs2/build.c | 6 +- fs/jffs2/erase.c | 3 - fs/jffs2/gc.c | 2 - fs/jffs2/nodelist.c | 1 - fs/jffs2/nodemgmt.c | 11 +- fs/jffs2/readinode.c | 2 - fs/jffs2/scan.c | 4 - fs/jffs2/summary.c | 2 - fs/jfs/jfs_txnmgr.c | 14 +- fs/libfs.c | 5 +- fs/mbcache.c | 1 - fs/namei.c | 1 - fs/netfs/io.c | 1 - fs/nfs/delegation.c | 3 - fs/nfs/pnfs.c | 2 - fs/nfs/write.c | 4 - fs/nilfs2/btree.c | 1 - fs/nilfs2/inode.c | 1 - fs/nilfs2/page.c | 4 - fs/nilfs2/segment.c | 4 - fs/notify/fanotify/fanotify_user.c | 1 - fs/notify/fsnotify.c | 1 - fs/ntfs/attrib.c | 3 - fs/ntfs/file.c | 2 - fs/ntfs3/file.c | 9 - fs/ntfs3/frecord.c | 2 - fs/ocfs2/alloc.c | 4 +- fs/ocfs2/cluster/tcp.c | 8 +- fs/ocfs2/dlm/dlmthread.c | 7 +- fs/ocfs2/file.c | 10 +- fs/proc/base.c | 1 - fs/proc/fd.c | 1 - fs/proc/kcore.c | 1 - fs/proc/page.c | 6 - fs/proc/task_mmu.c | 7 - fs/quota/dquot.c | 1 - fs/reiserfs/journal.c | 2 - fs/select.c | 1 - fs/smb/client/file.c | 2 - fs/splice.c | 1 - fs/ubifs/budget.c | 1 - fs/ubifs/commit.c | 1 - fs/ubifs/debug.c | 5 - fs/ubifs/dir.c | 1 - fs/ubifs/gc.c | 5 - fs/ubifs/io.c | 2 - fs/ubifs/lprops.c | 2 - fs/ubifs/lpt_commit.c | 3 - fs/ubifs/orphan.c | 1 - fs/ubifs/recovery.c | 4 - fs/ubifs/replay.c | 7 - fs/ubifs/scan.c | 2 - fs/ubifs/shrinker.c | 1 - fs/ubifs/super.c | 2 - fs/ubifs/tnc_commit.c | 2 - fs/ubifs/tnc_misc.c | 1 - fs/userfaultfd.c | 9 - fs/verity/enable.c | 1 - fs/verity/read_metadata.c | 1 - fs/xfs/scrub/common.h | 7 - fs/xfs/scrub/xfarray.c | 7 - fs/xfs/xfs_aops.c | 1 - fs/xfs/xfs_icache.c | 2 - fs/xfs/xfs_iwalk.c | 1 - include/asm-generic/preempt.h | 18 +- include/linux/console.h | 2 +- include/linux/context_tracking_state.h | 21 + include/linux/entry-common.h | 19 +- include/linux/entry-kvm.h | 2 +- include/linux/kernel.h | 32 +- include/linux/livepatch.h | 1 - include/linux/livepatch_sched.h | 29 - include/linux/preempt.h | 44 +- include/linux/rcupdate.h | 10 +- include/linux/rcutree.h | 2 +- include/linux/sched.h | 153 ++---- include/linux/sched/cond_resched.h | 1 - include/linux/sched/idle.h | 8 +- include/linux/thread_info.h | 29 +- include/linux/trace_events.h | 6 +- include/linux/vermagic.h | 2 +- include/linux/xarray.h | 14 + init/Makefile | 3 +- io_uring/io-wq.c | 4 +- io_uring/io_uring.c | 21 +- io_uring/kbuf.c | 2 - io_uring/sqpoll.c | 6 +- io_uring/tctx.c | 4 +- ipc/msgutil.c | 3 - ipc/sem.c | 2 - kernel/Kconfig.preempt | 70 +-- kernel/auditsc.c | 2 - kernel/bpf/Kconfig | 2 +- kernel/bpf/arraymap.c | 3 - kernel/bpf/bpf_iter.c | 7 +- kernel/bpf/btf.c | 9 - kernel/bpf/cpumap.c | 2 - kernel/bpf/hashtab.c | 7 - kernel/bpf/syscall.c | 3 - kernel/bpf/verifier.c | 5 - kernel/cgroup/rstat.c | 3 +- kernel/dma/debug.c | 2 - kernel/entry/common.c | 32 +- kernel/entry/kvm.c | 4 +- kernel/events/core.c | 2 +- kernel/futex/core.c | 6 +- kernel/futex/pi.c | 6 +- kernel/futex/requeue.c | 1 - kernel/futex/waitwake.c | 2 +- kernel/gcov/base.c | 1 - kernel/hung_task.c | 6 +- kernel/kallsyms.c | 4 +- kernel/kcsan/kcsan_test.c | 5 +- kernel/kexec_core.c | 6 - kernel/kthread.c | 1 - kernel/livepatch/core.c | 1 - kernel/livepatch/transition.c | 107 +--- kernel/locking/test-ww_mutex.c | 4 +- kernel/module/main.c | 1 - kernel/printk/printk.c | 65 +-- kernel/ptrace.c | 2 - kernel/rcu/Kconfig | 4 +- kernel/rcu/rcuscale.c | 2 - kernel/rcu/rcutorture.c | 8 +- kernel/rcu/tasks.h | 5 +- kernel/rcu/tree.c | 4 +- kernel/rcu/tree_exp.h | 4 +- kernel/rcu/tree_plugin.h | 7 +- kernel/rcu/tree_stall.h | 2 +- kernel/scftorture.c | 1 - kernel/sched/core.c | 497 +++++------------- kernel/sched/core_sched.c | 2 +- kernel/sched/deadline.c | 26 +- kernel/sched/debug.c | 67 +-- kernel/sched/fair.c | 54 +- kernel/sched/features.h | 18 + kernel/sched/idle.c | 6 +- kernel/sched/rt.c | 35 +- kernel/sched/sched.h | 9 +- kernel/softirq.c | 1 - kernel/stop_machine.c | 2 +- kernel/task_work.c | 1 - kernel/torture.c | 1 - kernel/trace/Kconfig | 4 +- kernel/trace/ftrace.c | 4 - kernel/trace/ring_buffer.c | 4 - kernel/trace/ring_buffer_benchmark.c | 13 - kernel/trace/trace.c | 29 +- kernel/trace/trace_events.c | 1 - kernel/trace/trace_osnoise.c | 37 +- kernel/trace/trace_output.c | 16 +- kernel/trace/trace_selftest.c | 9 - kernel/workqueue.c | 10 - lib/crc32test.c | 2 - lib/crypto/mpi/mpi-pow.c | 1 - lib/memcpy_kunit.c | 5 - lib/random32.c | 1 - lib/rhashtable.c | 2 - lib/test_bpf.c | 3 - lib/test_lockup.c | 2 +- lib/test_maple_tree.c | 8 - lib/test_rhashtable.c | 10 - mm/backing-dev.c | 8 +- mm/compaction.c | 23 +- mm/damon/paddr.c | 1 - mm/dmapool_test.c | 2 - mm/filemap.c | 11 +- mm/gup.c | 1 - mm/huge_memory.c | 3 - mm/hugetlb.c | 12 - mm/hugetlb_cgroup.c | 1 - mm/kasan/quarantine.c | 6 +- mm/kfence/kfence_test.c | 22 +- mm/khugepaged.c | 10 +- mm/kmemleak.c | 8 - mm/ksm.c | 21 +- mm/madvise.c | 3 - mm/memcontrol.c | 4 - mm/memfd.c | 10 +- mm/memory-failure.c | 1 - mm/memory.c | 12 +- mm/memory_hotplug.c | 6 - mm/mempolicy.c | 1 - mm/migrate.c | 6 - mm/mincore.c | 1 - mm/mlock.c | 2 - mm/mm_init.c | 13 +- mm/mmap.c | 1 - mm/mmu_gather.c | 2 - mm/mprotect.c | 1 - mm/mremap.c | 1 - mm/nommu.c | 1 - mm/page-writeback.c | 6 +- mm/page_alloc.c | 13 +- mm/page_counter.c | 1 - mm/page_ext.c | 1 - mm/page_idle.c | 2 - mm/page_io.c | 2 - mm/page_owner.c | 1 - mm/percpu.c | 5 - mm/rmap.c | 2 - mm/shmem.c | 19 +- mm/shuffle.c | 6 +- mm/slab.c | 3 - mm/swap_cgroup.c | 4 - mm/swapfile.c | 14 - mm/truncate.c | 4 - mm/userfaultfd.c | 3 - mm/util.c | 1 - mm/vmalloc.c | 5 - mm/vmscan.c | 29 +- mm/vmstat.c | 4 - mm/workingset.c | 1 - mm/z3fold.c | 15 +- mm/zsmalloc.c | 1 - mm/zswap.c | 1 - net/batman-adv/tp_meter.c | 2 - net/bpf/test_run.c | 1 - net/bridge/br_netlink.c | 1 - net/core/dev.c | 4 - net/core/neighbour.c | 1 - net/core/net_namespace.c | 1 - net/core/netclassid_cgroup.c | 1 - net/core/rtnetlink.c | 1 - net/core/sock.c | 2 - net/ipv4/inet_connection_sock.c | 3 - net/ipv4/inet_diag.c | 1 - net/ipv4/inet_hashtables.c | 1 - net/ipv4/inet_timewait_sock.c | 1 - net/ipv4/inetpeer.c | 1 - net/ipv4/netfilter/arp_tables.c | 2 - net/ipv4/netfilter/ip_tables.c | 3 - net/ipv4/nexthop.c | 1 - net/ipv4/tcp_ipv4.c | 2 - net/ipv4/udp.c | 2 - net/ipv6/fib6_rules.c | 1 - net/ipv6/netfilter/ip6_tables.c | 2 - net/ipv6/udp.c | 2 - net/mptcp/mptcp_diag.c | 2 - net/mptcp/pm_netlink.c | 5 - net/mptcp/protocol.c | 1 - net/netfilter/ipset/ip_set_core.c | 1 - net/netfilter/ipvs/ip_vs_est.c | 3 - net/netfilter/nf_conncount.c | 2 - net/netfilter/nf_conntrack_core.c | 3 - net/netfilter/nf_conntrack_ecache.c | 3 - net/netfilter/nf_tables_api.c | 2 - net/netfilter/nft_set_rbtree.c | 2 - net/netfilter/x_tables.c | 3 +- net/netfilter/xt_hashlimit.c | 1 - net/netlink/af_netlink.c | 1 - net/rds/ib_recv.c | 2 - net/rds/tcp.c | 2 +- net/rds/threads.c | 1 - net/rxrpc/call_object.c | 2 +- net/sched/sch_api.c | 3 - net/sctp/socket.c | 1 - net/socket.c | 2 - net/sunrpc/cache.c | 11 +- net/sunrpc/sched.c | 2 +- net/sunrpc/svc_xprt.c | 1 - net/sunrpc/xprtsock.c | 2 - net/tipc/core.c | 2 +- net/tipc/topsrv.c | 3 - net/unix/af_unix.c | 5 +- net/x25/af_x25.c | 1 - scripts/coccinelle/api/cond_resched.cocci | 53 ++ security/keys/gc.c | 1 - security/landlock/fs.c | 1 - security/selinux/ss/hashtab.h | 2 - security/selinux/ss/policydb.c | 6 - security/selinux/ss/services.c | 1 - security/selinux/ss/sidtab.c | 1 - sound/arm/aaci.c | 2 +- sound/core/seq/seq_virmidi.c | 2 - sound/hda/hdac_controller.c | 1 - sound/isa/sb/emu8000_patch.c | 5 - sound/isa/sb/emu8000_pcm.c | 2 +- sound/isa/wss/wss_lib.c | 1 - sound/pci/echoaudio/echoaudio_dsp.c | 2 - sound/pci/ens1370.c | 1 - sound/pci/es1968.c | 2 +- sound/pci/lola/lola.c | 1 - sound/pci/mixart/mixart_hwdep.c | 2 +- sound/pci/pcxhr/pcxhr_core.c | 5 - sound/pci/vx222/vx222_ops.c | 2 - sound/x86/intel_hdmi_audio.c | 1 - virt/kvm/pfncache.c | 2 - 596 files changed, 881 insertions(+), 2813 deletions(-) delete mode 100644 include/linux/livepatch_sched.h delete mode 100644 include/linux/sched/cond_resched.h create mode 100644 scripts/coccinelle/api/cond_resched.cocci -- 2.31.1