On Fri, Jun 28, 2024 at 3:08 PM Tejun Heo <tj@xxxxxxxxxx> wrote: > > Hello, Alexei. > > On Thu, Jun 27, 2024 at 06:34:14PM -0700, Alexei Starovoitov wrote: > ... > > > +__bpf_kfunc bool __scx_bpf_consume_task(unsigned long it, struct task_struct *p) > > > +{ > > > + struct bpf_iter_scx_dsq_kern *kit = (void *)it; > > > + struct scx_dispatch_q *dsq, *kit_dsq; > > > + struct scx_dsp_ctx *dspc = this_cpu_ptr(scx_dsp_ctx); > > > + struct rq *task_rq; > > > + u64 kit_dsq_seq; > > > + > > > + /* can't trust @kit, carefully fetch the values we need */ > > > + if (get_kernel_nofault(kit_dsq, &kit->dsq) || > > > + get_kernel_nofault(kit_dsq_seq, &kit->dsq_seq)) { > > > + scx_ops_error("invalid @it 0x%lx", it); > > > + return false; > > > + } > > > > With scx_bpf_consume_task() it's only a compile time protection from bugs. > > Since kfunc doesn't dereference any field in kit_dsq it won't crash > > immediately, but let's figure out how to make it work properly. > > > > Since kit_dsq and kit_dsq_seq are pretty much anything in this implementation > > can they be passed as two scalars instead ? > > I guess not, since tricking dsq != kit_dsq and > > time_after64(..,kit_dsq_seq) can lead to real issues ? > > That actually should be okay. It can lead to real but not crashing issues. > The system integrity is going to be fine no matter what the passed in seq > value is. It can just lead to confusing behaviors from the BPF scheduler's > POV, so it's fine to put the onus on the BPF scheduler. > > > Can some of it be mitigated by passing dsq into kfunc that > > was used to init the iter ? > > Then kfunc will read dsq->seq from it instead of kit->dsq_seq ? > > I don't quite follow this part. bpf_iter_scx_dsq_new() takes @dsq_id. The > function looks up the matching DSQ and then the iterator remembers the > current dsq->seq which serves as the threshold (tasks queued afterwards are > ignored). ie. The value needs to be copied at that point to guarantee that > iteration ignores tasks that are queued after the iteration started. > > > > + /* > > > + * Did someone else get to it? @p could have already left $dsq, got > > > + * re-enqueud, or be in the process of being consumed by someone else. > > > + */ > > > + if (unlikely(p->scx.dsq != dsq || > > > + time_after64(p->scx.dsq_seq, kit_dsq_seq) || > > > > In the previous patch you do: > > (s32)(p->scx.dsq_seq - kit->dsq_seq) > 0 > > and here > > time_after64(). > > Close enough, but 32 vs 64 and equality difference? > > Sorry about the sloppiness. It was originally u64 and then I forgot to > update here after changing them to u32. I'll add a helper for the comparison > and update both sites. > > Going back to the sequence number barrier, it's a sort of scoping and one > way to solve it is adding an explicit helper to fetch the target DSQ's > sequence number and then pass it to the consume_task function. ie. sth like: > > barrier_seq = scx_bpf_dsq_seq(dsq_id); > bpf_for_each(scx_dsq, p, dsq_id, 0) { > ... > scx_bpf_consume_task(p, dsq_id, barrier_seq); > } > > This should work but it's not as neat in that it now involves three dsq_id > -> DSQ lookups. Also, there's extra subtlety arising from @barrier_seq being > different from the barrier seq that the scx_dsq iterator would be using. maybe a stupid question, but why scx_dsq iterator cannot accept scx_dispatch_q pointer directly instead of dsq_id and then doing lookup? I.e., what if you had a kfunc to do dsq_id -> scx_dispatch_q lookup (returning PTR_TRUSTED instance), and then you can pass that to iterator, you can pass that to scx_bpf_consume_task() kfunc. Technically, you can also have another kfunc accepting scx_dispatch_q and returning current "timestamp" as some special type (TRUSTED and all), which will be passed into consume_task() as well. Is this too explicit in terms of types? > > As a DSQ iteration needs to have its own barrier sequence, maybe the answer > is to require passing it in as an explicit parameter. ie.: > > barrier_seq = scx_bpf_dsq_seq(dsq_id); > bpf_for_each(scx_dsq, p, dsq_id, barrier_seq, 0) { > ... > scx_bpf_consume_task(p, dsq_id, barrier_seq); > } > > There still are three dsq_id lookups but at least there is just one sequence > number in play. It is more cumbersome tho compared to the current interface: > > bpf_for_each(scx_dsq, p, dsq_id, 0) { > ... > scx_bpf_consume_task(BPF_FOR_EACH_ITER, p); > } > > What do you think? > > Thanks. > > -- > tejun >