On 2/17/20 5:09 AM, Peter Zijlstra wrote: > On Fri, Feb 14, 2020 at 01:44:32PM -0700, Jens Axboe wrote: > > I've not looked at git trees yet, but the below doesn't apply to > anything I have at hand. > > Anyway, I think I can still make sense of it -- just a rename or two > seems to be missing. > > A few notes on the below... Thanks for continuing to look at it, while we both try and make sense of it :-) >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index 04278493bf15..447b06c6bed0 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -685,6 +685,11 @@ struct task_struct { >> #endif >> struct sched_dl_entity dl; >> >> +#ifdef CONFIG_IO_URING >> + struct list_head uring_work; >> + raw_spinlock_t uring_lock; >> +#endif >> + > > Could we pretty please use struct callback_head for this, just like > task_work() and RCU ? Look at task_work_add() for inspiration. Sure, so add a new one, sched_work, and have it get this sched-in or sched-out behavior. Only potential hitch I see there is related to ordering, which is more of a fairness thab correctness issue. I'm going to ignore that for now, and we can always revisit later. > And maybe remove the uring naming form this. No problem >> #ifdef CONFIG_UCLAMP_TASK >> /* Clamp values requested for a scheduling entity */ >> struct uclamp_se uclamp_req[UCLAMP_CNT]; >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 51ca491d99ed..170fefa1caf8 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -2717,6 +2717,11 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) >> INIT_HLIST_HEAD(&p->preempt_notifiers); >> #endif >> >> +#ifdef CONFIG_IO_URING >> + INIT_LIST_HEAD(&p->uring_work); >> + raw_spin_lock_init(&p->uring_lock); >> +#endif >> + >> #ifdef CONFIG_COMPACTION >> p->capture_control = NULL; >> #endif >> @@ -4104,6 +4109,20 @@ void __noreturn do_task_dead(void) >> cpu_relax(); >> } >> >> +#ifdef CONFIG_IO_URING >> +extern void io_uring_task_handler(struct task_struct *tsk); >> + >> +static inline void io_uring_handler(struct task_struct *tsk) >> +{ >> + if (!list_empty(&tsk->uring_work)) >> + io_uring_task_handler(tsk); >> +} >> +#else /* !CONFIG_IO_URING */ >> +static inline void io_uring_handler(struct task_struct *tsk) >> +{ >> +} >> +#endif >> + >> static void sched_out_update(struct task_struct *tsk) >> { >> /* >> @@ -4121,6 +4140,7 @@ static void sched_out_update(struct task_struct *tsk) >> io_wq_worker_sleeping(tsk); >> preempt_enable_no_resched(); >> } >> + io_uring_handler(tsk); >> } >> >> static void sched_in_update(struct task_struct *tsk) >> @@ -4131,6 +4151,7 @@ static void sched_in_update(struct task_struct *tsk) >> else >> io_wq_worker_running(tsk); >> } >> + io_uring_handler(tsk); >> } > > The problem I have here is that we have an unconditional load of the > cacheline that has ->uring_work in. > > /me curses about how nobody seems interested in building useful > cacheline analyis tools :/ > > Lemme see if I can find a spot for this... perhaps something like so? > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 0918904c939d..4fba93293fa1 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -649,6 +649,7 @@ struct task_struct { > /* Per task flags (PF_*), defined further below: */ > unsigned int flags; > unsigned int ptrace; > + int on_rq; > > #ifdef CONFIG_SMP > struct llist_node wake_entry; > @@ -671,14 +672,16 @@ struct task_struct { > int recent_used_cpu; > int wake_cpu; > #endif > - int on_rq; > > int prio; > int static_prio; > int normal_prio; > unsigned int rt_priority; > > + struct callbach_head *sched_work; > + > const struct sched_class *sched_class; > + > struct sched_entity se; > struct sched_rt_entity rt; > #ifdef CONFIG_CGROUP_SCHED Thanks, I'll kick off the series with doing it based on this instead. -- Jens Axboe