On Thu, 12 Mar 2020 at 19:32, John Fastabend <john.fastabend@xxxxxxxxx> wrote: > > The restriction that the maps can not grow/shrink is perhaps limiting a > bit. I can see how resizing might be useful. In my original load balancer > case a single application owned all the socks so there was no need to > ever pull them back out of the map. We "knew" where they were. I think > resize ops could be added without to much redesign. Or a CREATE flag could > be used to add it as a new entry if needed. At some point I guess someone > will request it as a feature for Cilium for example. OTOH I'm not sure > off-hand how to use a dynamically sized table for load balancing. I > should know the size because I want to say something about the hash > distribution and if the size is changing do I still know this? I really > haven't considered it much. I agree, magically changing the size of a sockmap isn't useful. We don't want to do load-balancing, but still need stable indices into the map: - derive some sort of ID from the skb - look up the ID in the sockmap - return the socket as the result of the program If the ID changes we need to coordinate this with the eBPF, or at least update some other map in a race-free way. [...] > > Rather than expose the fd's to user space would a map copy api be > useful? I could imagine some useful cases where copy might be used > > map_copy(map *A, map *B, map_key *key) > > would need to sort out what to do with key/value size changes. But > I can imagine for upgrades this might be useful. I guess that would be a way to approach it. I'd probably find a primitive to copy a whole map atomically more useful, but haven't really thought about it much. > > Another option I've been considering the need for a garbage collection > thread trigger at regular intervals. This BPF program could do the > copy from map to map in kernel space never exposing fds out of kernel So, have a dummy prog that has both maps, and copies from old to new. Invoke that from user space via BPF_PROG_TEST_RUN? I guess that would work, but falls back to being "protected" by CAP_SYS_ADMIN. It's just more cumbersome than doing it in user space! Lorenz -- Lorenz Bauer | Systems Engineer 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK www.cloudflare.com