Re: [PATCH bpf-next 0/3] Introduce local_storage exclusive caching

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Sat, 23 Apr 2022 02:43:56 -0700

On Thu, Apr 21, 2022 at 6:47 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Tue, Apr 19, 2022 at 05:21:40PM -0700, Dave Marchevsky wrote:
> > Currently, each local_storage type (sk, inode, task) has a 16-entry
> > cache for local_storage data associated with a particular map. A
> > local_storage map is assigned a fixed cache_idx when it is allocated.
> > When looking in a local_storage for data associated with a map the cache
> > entry at cache_idx is the only place the map can appear in cache. If the
> > map's data is not in cache it is placed there after a search through the
> > cache hlist. When there are >16 local_storage maps allocated for a
> > local_storage type multiple maps have same cache_idx and thus may knock
> > each other out of cache.
> >
> > BPF programs that use local_storage may require fast and consistent
> > local storage access. For example, a BPF program using task local
> > storage to make scheduling decisions would not be able to tolerate a
> > long hlist search for its local_storage as this would negatively affect
> > cycles available to applications. Providing a mechanism for such a
> > program to ensure that its local_storage_data will always be in cache
> > would ensure fast access.
> >
> > This series introduces a BPF_LOCAL_STORAGE_FORCE_CACHE flag that can be
> > set on sk, inode, and task local_storage maps via map_extras. When a map
> > with the FORCE_CACHE flag set is allocated it is assigned an 'exclusive'
> > cache slot that it can't be evicted from until the map is free'd.
> >
> > If there are no slots available to exclusively claim, the allocation
> > fails. BPF programs are expected to use BPF_LOCAL_STORAGE_FORCE_CACHE
> > only if their data _must_ be in cache.
>
> I'm afraid new uapi flag doesn't solve this problem.
> Sooner or later every bpf program would deem itself "important" and
> performance critical. All of them will be using FORCE_CACHE flag
> and we will back to the same situation.
>
> Also please share the performance data that shows more than 16 programs
> that use local storage at the same time and existing simple cache
> replacing logic is not enough.
> For any kind link list walking to become an issue there gotta be at
> least 17 progs. Two progs should pick up the same cache_idx and
> run interleaved to evict each other.
> It feels like unlikely scenario, so real data would be good to see.
> If it really an issue we might need a different caching logic.
> Like instead of single link list per local storage we might
> have 16 link lists. cache_idx can point to a slot.
> If it's not 1st it will be a 2nd in much shorter link list.
> With 16 slots the link lists will have 2 elements until 32 bpf progs
> are using local storage.
> We can get rid of cache too and replace with mini hash table of N
> elements where map_id would be an index into a hash table.

This is a tangent to this discussion, but I was actually wondering why
the elements in bpf_local_storage are stored in a list rather than a
hashtable. Is there a specific reason for this?

> All sorts of other algorithms are possible.
> In any case the bpf user shouldn't be telling the kernel about
> "importance" of its program. If program is indeed executing a lot
> the kernel should be caching/accelerating it where it can.