* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > On Mon, Sep 11, 2023 at 02:16:18PM -0700, Linus Torvalds wrote: > > On Mon, 11 Sept 2023 at 13:50, Linus Torvalds > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > Except we've actually been *adding* to this whole mess, rather than > > > removing it. So we have actively *expanded* on that preemption choice > > > with PREEMPT_DYNAMIC. > > > > Actually, that config option makes no sense. > > > > It makes the sched_cond() behavior conditional with a static call. > > > > But all the *real* overhead is still there and unconditional (ie all > > the preempt count updates and the "did it go down to zero and we need > > to check" code). > > > > That just seems stupid. It seems to have all the overhead of a > > preemptible kernel, just not doing the preemption. > > > > So I must be mis-reading this, or just missing something important. > > > > The real cost seems to be > > > > PREEMPT_BUILD -> PREEMPTION -> PREEMPT_COUNT > > > > and PREEMPT vs PREEMPT_DYNAMIC makes no difference to that, since both > > will end up with that, and thus both cases will have all the spinlock > > preempt count stuff. > > > > There must be some non-preempt_count cost that people worry about. > > > > Or maybe I'm just mis-reading the Kconfig stuff entirely. That's > > possible, because this seems *so* pointless to me. > > > > Somebody please hit me with a clue-bat to the noggin. > > Well, I was about to reply to your previous email explaining this, but > this one time I did read more email.. > > Yes, PREEMPT_DYNAMIC has all the preempt count twiddling and only nops > out the schedule()/cond_resched() calls where appropriate. > > This work was done by a distro (SuSE) and if they're willing to ship this > I'm thinking the overheads are acceptable to them. > > For a significant number of workloads the real overhead is the extra > preepmtions themselves more than the counting -- but yes, the counting is > measurable, but probably in the noise compared to other some of the other > horrible things we have done the past years. > > Anyway, if distros are fine shipping with PREEMPT_DYNAMIC, then yes, > deleting the other options are definitely an option. Yes, so my understanding is that distros generally worry more about macro-overhead, for example material changes to a random subset of key benchmarks that specific enterprise customers care about, and distros are not nearly as sensitive about micro-overhead that preempt_count() maintenance causes. PREEMPT_DYNAMIC is basically a reflection of that: the desire to have only a single kernel image, but a boot-time toggle to differentiate between desktop and server loads and have CONFIG_PREEMPT (desktop) but also PREEMPT_VOLUNTARY behavior (server). There's also the view that PREEMPT kernels are a bit more QA-friendly, because atomic code sequences are much better defined & enforced via kernel warnings. Without preempt_count we only have irqs-off warnings, that are only a small fraction of all critical sections in the kernel. Ideally we'd be able to patch out most of the preempt_count maintenance overhead too - OTOH these days it's little more than noise on most CPUs, considering the kind of horrible security-workaround overhead we have on almost all x86 CPU types ... :-/ Thanks, Ingo