On Tue, 2019-03-19 at 08:36 -0700, Alexei Starovoitov wrote: > On Fri, Mar 08, 2019 at 08:51:03PM +0000, Saeed Mahameed wrote: > > Thoughts ? > > It's certainly an interesting idea. I think we need to agree on use > cases > and goals first before bikesheding on the solution. Sure will discuss most of the use cases tomorrow in the iovisor meeting. > > > Example use cases (XDP only for now): > > > > 1) Query XDP stats of all XDP netdevs: > > > > xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type > > = > > XDP_STATS); > > > > while (bpf_map_get_next_key(xdp_ctrl, &ifindex, &next_ifindex) == > > 0) { > > bpf_map_lookup_elem(xdp_ctrl, &next_ifindex, &stats); > > // we don't even need to know stats format in this case > > btf_pretty_print(xdp_ctrl->btf, &stats); > > ifindex = next_ifindex; > > } > > this bit show cases advantage of BTF nicely. > > > 2) Setup XDP tx redirect resources on egress netdev (netdev with no > > XDP > > program). > > > > xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type > > = > > XDP_ATTR); > > > > xdp_attr->command = SETUP_REDIRECT; > > xdp_attr->rings.num = 12; > > xdp_attr->rings.size = 128; > > bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr); > > this one starting to become a bit odd, since input arguments are > split > between key an value. ifindex is part of the key whereas command, > rings.num and rings.size are part of the value. The idea is that we need to keep a map semantics (object->vlaue). in this case ( map_attr.ctrl_type = XDP_ATTR ) object (key) is if_index and value is xdp attributes of that if_index, which plays nicely when you want to iterate through all the objects (XDP netdevs). simply think of ifindex as a key and not an input value. different ctrl map types can have different keys and values hence the need for map_create and passing map_attr.ctrl_type attribute which will define what the user is trying to access (which control map) and what will be the key (object) & value (configuration) pair layout. > > 3) Turn On/Off XDP meta data offloads and retrieve meta data BTF > > format > > of specific netdev/hardware: > > > > xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type > > = > > XDP_ATTR); > > > > xdp_attr->command = SETUP_MD; > > xdp_attr->enable_md = 1; > > err = bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr); > > if (err) { > > printf("XDP meta data is not supported on this netdev"); > > return; > > } > > // Query Meta data BTF > > bpf_map_lookup_elem(xdp_ctrl, &ifindex, &xdp_attr); > > md_btf = xdp_attr.md_btf; > > here it gets even weirder, since lookup arguments are also > split between key and value. > ifindex is inside the key while addition info is passed > inside xdp_attr which is part of value. > > I wish we could do maps where every key would have different layout. > Then such api would be suitable. > The existing maps require all keys and values to be inform. > I guess one can argue that such 'control map' can have one element > and one value, but then what is the purpose of 'map_create' > and 'map_delete==close(fd)' operations? > This is exactly the purpose of map_{create/delete} to define what the key,vlaue format will be ( it is not going to be ifindex for all control maps), to make it clear, the key doesn't have to be an ifindex at all, it depends on the map_attr.ctrl_type which the user request on map creation, so different layouts are already supported by this proposal examples of different map_attr.ctrl_type: fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = XDP_ATTR) // Key layout == ifindex, vlaue format if_xdp_attributes fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = BPF_PROG_STATS) // Key layout == prog_fd, value format struct bpf_prog_stats fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = BPF_SOCK_ATTR) // Key layout == socket_fd, value layout struct bpf_sock_attr extending this can be done in any linux BPF subsystem, by introducing new map_attr.ctrl_types and new key value layouts of that control type. on map creation we will attach the btf format of the (key, value) pair to the map created for that user. > I think we need something else here. Using BTF to describe > output stats is nice, but using BTF to describe input query is > problematic, > since user cannot know before hand what kernel can and cannot accept. > imo input should be stable uapi with fixed constants whereas > stats-like output can be BTF based and vary from driver to driver > and from one nic version to another. > Well, we can decide to use static stable uapi, and still use this special map to leverage the map API as described in this doc. but also we can allow dynamic value layouts, but any extension should be done to the end of any value structuer and on lookup we will only copy the size that the user already recognizes .. ? or we can assume/force the user to use the map_attributes to figure out format layouts and sizes, but still we will guarantee backward compatibility in kernel level by keeping old format the same, and extension is only allowed to the end of the value structures.