On Mon, Sep 11, 2023 at 10:04:17AM -0700, Ankur Arora wrote: > > Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes: > > > On Sun, Sep 10, 2023 at 11:32:32AM -0700, Linus Torvalds wrote: > > > >> I was hoping that we'd have some generic way to deal with this where > >> we could just say "this thing is reschedulable", and get rid of - or > >> at least not increasingly add to - the cond_resched() mess. > > > > Isn't that called PREEMPT=y ? That tracks precisely all the constraints > > required to know when/if we can preempt. > > > > The whole voluntary preempt model is basically the traditional > > co-operative preemption model and that fully relies on manual yields. > > Yeah, but as Linus says, this means a lot of code is just full of > cond_resched(). For instance a loop the process_huge_page() uses > this pattern: > > for (...) { > cond_resched(); > clear_page(i); > > cond_resched(); > clear_page(j); > } Yeah, that's what co-operative preemption gets you. > > The problem with the REP prefix (and Xen hypercalls) is that > > they're long running instructions and it becomes fundamentally > > impossible to put a cond_resched() in. > > > >> Yes. I'm starting to think that that the only sane solution is to > >> limit cases that can do this a lot, and the "instruciton pointer > >> region" approach would certainly work. > > > > From a code locality / I-cache POV, I think a sorted list of > > (non overlapping) ranges might be best. > > Yeah, agreed. There are a few problems with doing that though. > > I was thinking of using a check of this kind to schedule out when > it is executing in this "reschedulable" section: > !preempt_count() && in_resched_function(regs->rip); > > For preemption=full, this should mostly work. > For preemption=voluntary, though this'll only work with out-of-line > locks, not if the lock is inlined. > > (Both, should have problems with __this_cpu_* and the like, but > maybe we can handwave that away with sparse/objtool etc.) So one thing we can do is combine the TIF_ALLOW_RESCHED with the ranges thing, and then only search the range when TIF flag is set. And I'm thinking it might be a good idea to have objtool validate the range only contains simple instructions, the moment it contains control flow I'm thinking it's too complicated. > How expensive would be always having PREEMPT_COUNT=y? Effectively I think that is true today. At the very least Debian and SuSE (I can't find a RHEL .config in a hurry but I would think they too) ship with PREEMPT_DYNAMIC=y. Mel, I'm sure you ran numbers at the time (you always do), what if any was the measured overhead from PREEMPT_DYNAMIC vs 'regular' voluntary preemption?