On Thu, Jul 25, 2019 at 04:25:53PM -0700, Brian Vazquez wrote: > > > > If prev_key is deleted before map_get_next_key(), we get the first key > > > > again. This is pretty weird. > > > > > > Yes, I know. But note that the current scenario happens even for the > > > old interface (imagine you are walking a map from userspace and you > > > tried get_next_key the prev_key was removed, you will start again from > > > the beginning without noticing it). > > > I tried to sent a patch in the past but I was missing some context: > > > before NULL was used to get the very first_key the interface relied in > > > a random (non existent) key to retrieve the first_key in the map, and > > > I was told what we still have to support that scenario. > > > > BPF_MAP_DUMP is slightly different, as you may return the first key > > multiple times in the same call. Also, BPF_MAP_DUMP is new, so we > > don't have to support legacy scenarios. > > > > Since BPF_MAP_DUMP keeps a list of elements. It is possible to try > > to look up previous keys. Would something down this direction work? > > I've been thinking about it and I think first we need a way to detect > that since key was not present we got the first_key instead: > > - One solution I had in mind was to explicitly asked for the first key > with map_get_next_key(map, NULL, first_key) and while walking the map > check that map_get_next_key(map, prev_key, key) doesn't return the > same key. This could be done using memcmp. > - Discussing with Stan, he mentioned that another option is to support > a flag in map_get_next_key to let it know that we want an error > instead of the first_key. > > After detecting the problem we also need to define what we want to do, > here some options: > > a) Return the error to the caller > b) Try with previous keys if any (which be limited to the keys that we > have traversed so far in this dump call) > c) continue with next entries in the map. array is easy just get the > next valid key (starting on i+1), but hmap might be difficult since > starting on the next bucket could potentially skip some keys that were > concurrently added to the same bucket where key used to be, and > starting on the same bucket could lead us to return repeated elements. > > Or maybe we could support those 3 cases via flags and let the caller > decide which one to use? this type of indecision is the reason why I wasn't excited about batch dumping in the first place and gave 'soft yes' when Stan mentioned it during lsf/mm/bpf uconf. We probably shouldn't do it. It feels this map_dump makes api more complex and doesn't really give much benefit to the user other than large map dump becomes faster. I think we gotta solve this problem differently.