On 08/30, Yonghong Song wrote: > > > On 8/30/19 1:15 PM, Stanislav Fomichev wrote: > > On 08/29, Jakub Kicinski wrote: > >> On Thu, 29 Aug 2019 16:13:59 -0700, Brian Vazquez wrote: > >>>> We need a per-map implementation of the exec side, but roughly maps > >>>> would do: > >>>> > >>>> LIST_HEAD(deleted); > >>>> > >>>> for entry in map { > >>>> struct map_op_ctx { > >>>> .key = entry->key, > >>>> .value = entry->value, > >>>> }; > >>>> > >>>> act = BPF_PROG_RUN(filter, &map_op_ctx); > >>>> if (act & ~ACT_BITS) > >>>> return -EINVAL; > >>>> > >>>> if (act & DELETE) { > >>>> map_unlink(entry); > >>>> list_add(entry, &deleted); > >>>> } > >>>> if (act & STOP) > >>>> break; > >>>> } > >>>> > >>>> synchronize_rcu(); > >>>> > >>>> for entry in deleted { > >>>> struct map_op_ctx { > >>>> .key = entry->key, > >>>> .value = entry->value, > >>>> }; > >>>> > >>>> BPF_PROG_RUN(dumper, &map_op_ctx); > >>>> map_free(entry); > >>>> } > >>>> > >>> Hi Jakub, > >>> > >>> how would that approach support percpu maps? > >>> > >>> I'm thinking of a scenario where you want to do some calculations on > >>> percpu maps and you are interested on the info on all the cpus not > >>> just the one that is running the bpf program. Currently on a pcpu map > >>> the bpf_map_lookup_elem helper only returns the pointer to the data of > >>> the executing cpu. > >> > >> Right, we need to have the iteration outside of the bpf program itself, > >> and pass the element in through the context. That way we can feed each > >> per cpu entry into the program separately. > > My 2 cents: > > > > I personally like Jakub's/Quentin's proposal more. So if I get to choose > > between this series and Jakub's filter+dump in BPF, I'd pick filter+dump > > (pending per-cpu issue which we actually care about). > > > > But if we can have both, I don't have any objections; this patch > > series looks to me a lot like what Brian did, just extended to more > > commands. If we are fine with the shortcomings raised about the > > original series, then let's go with this version. Maybe we can also > > look into addressing these independently. > > > > But if I pretend that we live in an ideal world, I'd just go with > > whatever Jakub and Quentin are doing so we don't have to support > > two APIs that essentially do the same (minus batching update, but > > it looks like there is no clear use case for that yet; maybe). > > > > I guess you can hold off this series a bit and discuss it at LPC, > > you have a talk dedicated to that :-) (and afaiu, you are all going) > > Absolutely. We will have a discussion on map batching and I signed > on with that :-). One of goals for this patch set is for me to explore > what uapi (attr and bpf subcommands) we should expose to users. > Hopefully at that time we will get more clarity > on Jakub's approach and we can discuss how to proceed. Sounds good! Your series didn't have an RFC tag, so I wasn't sure whether we've fully committed to that approach or not.