On Mon, Aug 14, 2023 at 09:55:37AM -0700, Martin KaFai Lau wrote: > On 8/11/23 4:36 PM, David Vernet wrote: > > I see, thanks for explaining. This is why sched_ext doesn't really work > > with the BPF_F_LINK version of map update. We can't guarantee that a map > > can be updated if we can't succeed in ->reg(), because we can also race > > with e.g. sysrq unloading the scheduler between ->validate() and > > ->reg(). In a sense, it feels like ->reg() in "updateable" struct_ops > > implementations should be void, whereas in other struct_ops > > implementations like scx() it has to be int *. If validate() is meant to > > prevent the scenario you outlined, can you help me understand why we > > still check the return value of ->reg() in bpf_struct_ops_link_create()? > > Or at the very least it seems like we should WARN_ON()? > > ->regs() can fail if another struct_ops under the same > name has already been loaded to the subsystem. If another > subsystem needs another return value to support .update, I > believe it can be done if that is blocking scx to support > "updateable" link. Ok, so ->validate() is a static check that should either always succeed or always fail, and ->reg() may fail due to runtime circumstances. So a map that passes ->validate() could e.g. retry to create the link in a loop or something. Or create a series of validated struct_ops maps and then have a management layer that destroys and creates links for the map you want to actually use. Thanks for explaining. > > > If it needs to validate struct_ops as a while, > > There was a typo: as a /whole/. > > > > > > > 1. it must be implemented in .validate instead of .reg. Otherwise, it may > > > end up having an unusable map. > > > > Some clarity on this point (why we check ->reg() on the ->validate() > > path) would help me write this comment more clearly. > > > hmm... where does it check ->reg() on the ->validate() now? > > I was meaning the struct_ops supported subsystem should > validate the struct_ops map in '.validate' instead of in > the '.reg'. > > or I misunderstood the question? I just meant that I wasn't understanding why we had to check the return value of ->reg() in bpf_struct_ops_link_create(). Now that I understand the semantics, I can document them.