On Mon, Sep 27, 2021 at 9:41 AM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Fri, Sep 24, 2021 at 04:12:11PM -0700, Andrii Nakryiko wrote: > > > > That's not what I proposed. So let's say somewhere in the kernel we > > have this variable: > > > > static int bpf_bloom_exists = 1; > > > > Now, for bpf_map_lookup_elem() helper we pass data as key pointer. If > > all its hashed bits are set in Bloom filter (it "exists"), we return > > &bpf_bloom_exists. So it's not a NULL pointer. > > imo that's too much of a hack. too bad, because this feels pretty natural in BPF code: int my_key = 1234; if (bpf_map_lookup_elem(&my_bloom_filter, &my_key)) { /* handle "maybe exists" */ } else { /* handle "definitely doesn't exist" */ } With map->value_size == 0 verifier should be able to ensure that whatever non-NULL pointer is returned isn't readable/writable. We might need to fix some BPF verifier internals if that's not supported yet, I haven't checked thoroughly. > > > Now, despite returning a valid pointer, it would be good to prevent > > reading/writing from/to it in a valid BPF program. I'm hoping it is as > > easy as just seetting map->value_size = 0 during map creation. But > > worst case, we can let BPF program just overwrite that 1 with whatever > > they want. It doesn't matter because the contract is that > > bpf_map_lookup_elem() returns non-NULL for "exists" and NULL > > otherwise. > > > > Now, for MAP_LOOKUP_ELEM command on user-space side. I'd say the > > contract should be 0 return code on "exists" (and nothing is written > > to value pointer, perhaps even disallow to specify the non-NULL value > > pointer altogether); -ENOENT, otherwise. Again, worst-case we can > > specify that "value_size" is 1 or 4 bytes and write 1 to it. > > > > Does it make sense? > > > > > > BTW, for STACK/QUEUE maps, I'm not clear why we had to add > > map_peek_elem/map_pop_elem/map_push_elem operations, to be honest. > > They could have been pretty reasonably mapped to map_lookup_elem (peek > > is non-destructive lookup), map_lookup_elem + some flag (pop is > > lookup, but destructive, so flag allows "destruction"), and > > map_update_elem (for push). map_delete_elem would be pop for when > > value can be ignored. > > > > That's to say that I don't consider STACK/QUEUE maps as good examples, > > it's rather a counter-example of maps that barely anyone is using, yet > > it just adds clutter to BPF internals. > > Repurposing lookup/update ops for stack/queue and forcing bpf progs > to pass dummy key would have looked quite ugly. The idea was to pass non-NULL *key* pointer. For bpf_map_update_elem() you'd pass non-NULL key and NULL value. But it's fine, we have peek/pop/push now, might as well use them. From user-space side it's still mapped to LOOKUP/UPDATE/DELETE commands, right? BTW, seems like we have LOOKUP_AND_DELETE command as well which is implemented for STACK/QUEUE maps, but not BLOOM_FILTER. I guess for consistency we'd have to implement that one for user-space as well? > peek/pop/push have one pointer. That pointer points to value. > For bitset we have single pointer as well. > So it makes perfect sense to reuse push/pop/peek and keep bitset > as a value from the verifier side. Ok. > bpf_helper_defs.h could have static inline functions: > bool bpf_bitset_clear(map, key); > bool bpf_bitset_set(map, key); > bool bpf_bitset_test(map, key); > > But they will map to FUNC_map_pop_elem/push/peek as func ids > and will be seen as values from the verifier pov. > > The bpf progs might think of them as keys, but they will be value_size. > The bitset creation could use key_size as an input, but internally > set it as value_size. > Not sure whether such internal vs uapi remaping will not lead > to confusion and bugs though. I think it will and I wouldn't do it. > I agree that key as an input to bitset_clear/set/test make more sense, > but the verifier needs value_size to plug into peek/pop infra. Which is why I initially proposed to not use peek/pop infra and instead do lookup/update/delete, but if a NULL value pointer passed into bpf_map_update_elem() disqualifies this, then it's fine. But I didn't ask to repurpose peek/pop for working with key pointers. > > I don't think it's worth to extend ops with yet another 3 callbacks > and have clean ARG_PTR_TO_UNINIT_MAP_KEY there. > That is probably too much clutter. > > I think > bool bpf_bitset_clear(map, value); > bool bpf_bitset_set(map, value); > bool bpf_bitset_test(map, value); > doesn't look too bad either. > At least this way the bit set map_create op will use value_size > and keep it as value_size. The bpf prog won't be confusing. > That's my preference, so far.