On Tue, Sep 19 2023 at 14:30, Thomas Gleixner wrote: > On Mon, Sep 18 2023 at 18:57, Linus Torvalds wrote: >> Then the question becomes whether we'd want to introduce a *new* >> concept, which is a "if you are going to schedule, do it now rather >> than later, because I'm taking a lock, and while it's a preemptible >> lock, I'd rather not sleep while holding this resource". >> >> I suspect we want to avoid that for now, on the assumption that it's >> hopefully not a problem in practice (the recently addressed problem >> with might_sleep() was that it actively *moved* the scheduling point >> to a bad place, not that scheduling could happen there, so instead of >> optimizing scheduling, it actively pessimized it). But I thought I'd >> mention it. > > I think we want to avoid that completely and if this becomes an issue, > we rather be smart about it at the core level. > > It's trivial enough to have a per task counter which tells whether a > preemtible lock is held (or about to be acquired) or not. Then the > scheduler can take that hint into account and decide to grant a > timeslice extension once in the expectation that the task leaves the > lock held section soonish and either returns to user space or schedules > out. It still can enforce it later on. > > We really want to let the scheduler decide and rather give it proper > hints at the conceptual level instead of letting developers make random > decisions which might work well for a particular use case and completely > suck for the rest. I think we wasted enough time already on those. Finally I realized why cond_resched() & et al. are so disgusting. They are scope-less and just a random spot which someone decided to be a good place to reschedule. But in fact the really relevant measure is scope. Full preemption is scope based: preempt_disable(); do_stuff(); preempt_enable(); which also nests properly: preempt_disable(); do_stuff() preempt_disable(); do_other_stuff(); preempt_enable(); preempt_enable(); cond_resched() cannot nest and is obviously scope-less. The TIF_ALLOW_RESCHED mechanism, which sparked this discussion only pretends to be scoped. As Peter pointed out it does not properly nest with other mechanisms and it cannot even nest in itself because it is boolean. The worst thing about it is that it is semantically reverse to the established model of preempt_disable()/enable(), i.e. allow_resched()/disallow_resched(). So instead of giving the scheduler a hint about 'this might be a good place to preempt', providing proper scope would make way more sense: preempt_lazy_disable(); do_stuff(); preempt_lazy_enable(); That would be the obvious and semantically consistent counterpart to the existing preemption control primitives with proper nesting support. might_sleep(), which is in all the lock acquire functions or your variant of hint (resched better now before I take the lock) are the wrong place. hint(); lock(); do_stuff(); unlock(); hint() might schedule and when the task comes back schedule immediately again because the lock is contended. hint() does again not have scope and might be meaningless or even counterproductive if called in a deeper callchain. Proper scope based hints avoid that. preempt_lazy_disable(); lock(); do_stuff(); unlock(); preempt_lazy_enable(); That's way better because it describes the scope and the task will either schedule out in lock() on contention or provide a sensible lazy preemption point in preempt_lazy_enable(). It also nests properly: preempt_lazy_disable(); lock(A); do_stuff() preempt_lazy_disable(); lock(B); do_other_stuff(); unlock(B); preempt_lazy_enable(); unlock(A); preempt_lazy_enable(); So in this case it does not matter wheter do_stuff() is invoked from a lock held section or not. The scope which defines the throughput relevant hint to the scheduler is correct in any case. Contrary to preempt_disable() the lazy variant does neither prevent scheduling nor preemption, but its a understandable properly nestable mechanism. I seriously hope to avoid it alltogether :) Thanks, tglx