On Mon, Aug 21, 2023 at 5:55 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi, > > On 8/22/2023 7:49 AM, Alexei Starovoitov wrote: > > On Sat, Aug 19, 2023 at 3:39 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >> Hi, > >> > >> On 8/18/2023 7:00 AM, Alexei Starovoitov wrote: > >>> On Wed, Aug 16, 2023 at 11:35 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >>>> ping ? > >>> Sorry for the delay. I've been on PTO. > >>> > >>>> On 8/3/2023 9:28 PM, Hou Tao wrote: > >>>>> On 8/3/2023 9:19 PM, Hou Tao wrote: > >>>>>> Hi, > >>>>>> > >>>>>> I am preparing for qp-trie v4, but I need some help on how to support > >>>>>> variable-sized key in bpf syscall. The implementation of qp-trie needs > >>>>>> to distinguish between dynptr key from bpf program and variable-sized > >>>>>> key from bpf syscall. In v3, I added a new dynptr type: > >>>>>> BPF_DYNPTR_TYPE_USER for variable-sized key from bpf syscall [0], so > >>>>>> both bpf program and bpf syscall will use the same type to represent the > >>>>>> variable-sized key, but Andrii thought ptr+size tuple was simpler and > >>>>>> would be enough for user APIs, so in v4, the type of key for bpf program > >>>>>> and syscall will be different. One way to handle that is to add a new > >>>>>> parameter in .map_lookup_elem()/.map_delete_elem()/.map_update_elem() to > >>>>>> tell whether the key comes from bpf program or syscall or introduce new > >>>>>> APIs in bpf_map_ops for variable-sized key related syscall, but I think > >>>>>> it will introduce too much churn. Considering that the size of > >>>>>> bpf_dynptr_kern is 8-bytes aligned, so I think maybe I could reuse the > >>>>>> lowest 1-bit of key pointer to tell qp-trie whether or not it is a > >>>>>> bpf_dynptr_kern or a variable-sized key pointer from syscall. For > >>>>>> bpf_dynptr_kern, because it is 8B-aligned, so its lowest bit must be 0, > >>>>>> and for variable-sized key from syscall, I could allocated a 4B-aligned > >>>>>> pointer and setting the lowest bit as 1, so qp-trie can distinguish > >>>>>> between these two types of pointer. The question is that I am not sure > >>>>>> whether the idea above is a good one or not. Does it sound fragile ? Or > >>>>>> is there any better way to handle that ? > >>> Let's avoid bit hacks. They're not extensible and should be used > >>> only in cases where performance matters a lot or memory constraints are extreme. > >> I see. Neither the performance reason nor the memory limitation fit here. > >>> ptr/sz tuple from syscall side sounds the simplest. > >>> I agree with Andrii exposing the dynptr concept to user space > >>> and especially as part of syscall is unnecessary. > >>> We already have LPM as a precedent. Maybe we can do the same here? > >>> No need to add new sys_bpf commands. > >> There is no need to add new sys_bpf commands. We can extend bpf_attr to > >> support variable-sized key in qp-trie for bpf syscall. The probem is the > >> keys from bpf syscall and bpf program are different: bpf syscall uses > >> ptr+size tuple and bpf program uses dynptr, but the APIs in bpf_map_ops > >> only uses a pointer to represent the key, so qp-trie can not distinguish > >> between the keys from bpf syscall and bpf program. In qp-trie v1, the > >> key of qp-trie was similar with LPM-trie: both the syscall and program > >> used the same key format. But the key format for bpf program changed to > >> dynptr in qp-trie v2 according to the suggestion from Andrii. I think it > >> is also a bad ideal to go back to v1 again. > >> > >>> If the existing bpf_map_lookup_elem() helper doesn't fit qp-tree we can > >>> use new kfuncs from bpf prog and LPM-like map accessors from syscall. > >> It is a feasible solution, but it will make qp-trie be different with > >> other map types. I will try to extend the APIs in bpf_map_ops first to > >> see how much churns it may introduce. > > you mean you want to try to dynamically adapt bpf_map_lookup_elem() > > to consider 'void *key' as a pointer to dynptr for bpf prog and > > lpm-like tuple for syscall? > > I'm afraid the verifier changes will be messy, since PTR_TO_MAP_KEY is > > quite special. > > No. I didn't plan to do that. I am trying to add three new APIs in > bpf_map_ops to handle lookup/update/delete operation from bpf syscall > (e.g, map_lookup_elem_syscall). So bpf program and bpf syscall can use > different API to operate on qp-trie. How does bpf prog side api look like? I thought we wanted to use dynptr as a key?