On Thu, Apr 21, 2022 at 6:47 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Tue, Apr 19, 2022 at 05:21:40PM -0700, Dave Marchevsky wrote: > > Currently, each local_storage type (sk, inode, task) has a 16-entry > > cache for local_storage data associated with a particular map. A > > local_storage map is assigned a fixed cache_idx when it is allocated. > > When looking in a local_storage for data associated with a map the cache > > entry at cache_idx is the only place the map can appear in cache. If the > > map's data is not in cache it is placed there after a search through the > > cache hlist. When there are >16 local_storage maps allocated for a > > local_storage type multiple maps have same cache_idx and thus may knock > > each other out of cache. > > > > BPF programs that use local_storage may require fast and consistent > > local storage access. For example, a BPF program using task local > > storage to make scheduling decisions would not be able to tolerate a > > long hlist search for its local_storage as this would negatively affect > > cycles available to applications. Providing a mechanism for such a > > program to ensure that its local_storage_data will always be in cache > > would ensure fast access. > > > > This series introduces a BPF_LOCAL_STORAGE_FORCE_CACHE flag that can be > > set on sk, inode, and task local_storage maps via map_extras. When a map > > with the FORCE_CACHE flag set is allocated it is assigned an 'exclusive' > > cache slot that it can't be evicted from until the map is free'd. > > > > If there are no slots available to exclusively claim, the allocation > > fails. BPF programs are expected to use BPF_LOCAL_STORAGE_FORCE_CACHE > > only if their data _must_ be in cache. > > I'm afraid new uapi flag doesn't solve this problem. > Sooner or later every bpf program would deem itself "important" and > performance critical. All of them will be using FORCE_CACHE flag > and we will back to the same situation. > > Also please share the performance data that shows more than 16 programs > that use local storage at the same time and existing simple cache > replacing logic is not enough. > For any kind link list walking to become an issue there gotta be at > least 17 progs. Two progs should pick up the same cache_idx and > run interleaved to evict each other. > It feels like unlikely scenario, so real data would be good to see. > If it really an issue we might need a different caching logic. > Like instead of single link list per local storage we might > have 16 link lists. cache_idx can point to a slot. > If it's not 1st it will be a 2nd in much shorter link list. > With 16 slots the link lists will have 2 elements until 32 bpf progs > are using local storage. > We can get rid of cache too and replace with mini hash table of N > elements where map_id would be an index into a hash table. This is a tangent to this discussion, but I was actually wondering why the elements in bpf_local_storage are stored in a list rather than a hashtable. Is there a specific reason for this? > All sorts of other algorithms are possible. > In any case the bpf user shouldn't be telling the kernel about > "importance" of its program. If program is indeed executing a lot > the kernel should be caching/accelerating it where it can.