On Tue, Dec 15, 2020 at 6:35 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Tue, Dec 15, 2020 at 6:10 PM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote: > > > > Sure, people also implement CT on native hash map too and timeout > > with user-space timers. ;) > > exactly. what's wrong with that? > Perfectly fine way to do CT. Seriously? When we have 8 millions of entries in a hash map, it is definitely seriously wrong to purge entries one by one from user-space. In case you don't believe me, take a look at what cilium CT GC does, which is precisely expires entries one by one in user-space: https://github.com/cilium/cilium/blob/0f57292c0037ee23ba1ca2f9abb113f36a664645/pkg/bpf/map_linux.go#L728 https://github.com/cilium/cilium/blob/master/pkg/maps/ctmap/ctmap.go#L398 and of course what people complained: https://github.com/cilium/cilium/issues/5048 > > > > Anything extra can be added on top from user space > > > which can easily copy with 1 sec granularity. > > > > The problem is never about granularity, it is about how efficient we can > > GC. User-space has to scan the whole table one by one, while the kernel > > can just do this behind the scene with a much lower overhead. > > > > Let's say we arm a timer for each entry in user-space, it requires a syscall > > and locking buckets each time for each entry. Kernel could do it without > > any additional syscall and batching. Like I said above, we could have > > millions of entries, so the overhead would be big in this scenario. > > and the user space can pick any other implementation instead > of trivial entry by entry gc with timer. Unless they don't have to, right? With timeout implementation in kernel, user space does not need to invent any wheel. > > > > Say the kernel does GC and deletes htab entries. > > > How user space will know that it's gone? There would need to be > > > > By a lookup. > > > > > an event sent to user space when entry is being deleted by the kernel. > > > But then such event will be racy. Instead when timers and expirations > > > are done by user space everything is in sync. > > > > Why there has to be an event? > > because when any production worthy implementation moves > past the prototype stage there is something that user space needs to keep > as well. Sometimes the bpf map in the kernel is alone. > But a lot of times there is a user space mirror of the map in c++ or golang > with the same key where user space keeps extra data. So... what event does LRU map send when it deletes a different entry when the map is full? Thanks.