Re: [PATCH bpf-next v3 2/3] bpf: Support kptrs in local storage maps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 4 Mar 2023 at 08:53, Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 2/25/23 7:40 AM, Kumar Kartikeya Dwivedi wrote:
> > diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
> > index 6d37a40cd90e..0fe92986412b 100644
> > --- a/include/linux/bpf_local_storage.h
> > +++ b/include/linux/bpf_local_storage.h
> > @@ -74,6 +74,12 @@ struct bpf_local_storage_elem {
> >       struct hlist_node snode;        /* Linked to bpf_local_storage */
> >       struct bpf_local_storage __rcu *local_storage;
> >       struct rcu_head rcu;
> > +     bool can_use_smap; /* Is it safe to access smap in bpf_selem_free_* RCU
> > +                         * callbacks? bpf_local_storage_map_free only
> > +                         * executes rcu_barrier when there are special
> > +                         * fields, this field remembers that to ensure we
> > +                         * don't access already freed smap in sdata.
> > +                         */
> >       /* 8 bytes hole */
> >       /* The data is stored in another cacheline to minimize
> >        * the number of cachelines access during a cache hit.
> > diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> > index 58da17ae5124..2bdd722fe293 100644
> > --- a/kernel/bpf/bpf_local_storage.c
> > +++ b/kernel/bpf/bpf_local_storage.c
> > @@ -85,6 +85,7 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
> >       if (selem) {
> >               if (value)
> >                       copy_map_value(&smap->map, SDATA(selem)->data, value);
> > +             /* No need to call check_and_init_map_value as memory is zero init */
> >               return selem;
> >       }
> >
> > @@ -113,10 +114,25 @@ static void bpf_selem_free_rcu(struct rcu_head *rcu)
> >       struct bpf_local_storage_elem *selem;
> >
> >       selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
> > +     /* The can_use_smap bool is set whenever we need to free additional
> > +      * fields in selem data before freeing selem. bpf_local_storage_map_free
> > +      * only executes rcu_barrier to wait for RCU callbacks when it has
> > +      * special fields, hence we can only conditionally dereference smap, as
> > +      * by this time the map might have already been freed without waiting
> > +      * for our call_rcu callback if it did not have any special fields.
> > +      */
> > +     if (selem->can_use_smap)
> > +             bpf_obj_free_fields(SDATA(selem)->smap->map.record, SDATA(selem)->data);
> > +     kfree(selem);
> > +}
> > +
> > +static void bpf_selem_free_tasks_trace_rcu(struct rcu_head *rcu)
> > +{
> > +     /* Free directly if Tasks Trace RCU GP also implies RCU GP */
> >       if (rcu_trace_implies_rcu_gp())
> > -             kfree(selem);
> > +             bpf_selem_free_rcu(rcu);
> >       else
> > -             kfree_rcu(selem, rcu);
> > +             call_rcu(rcu, bpf_selem_free_rcu);
> >   }
> >
> >   /* local_storage->lock must be held and selem->local_storage == local_storage.
> > @@ -170,9 +186,9 @@ static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
> >               RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
> >
> >       if (use_trace_rcu)
> > -             call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_rcu);
> > +             call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_tasks_trace_rcu);
> >       else
> > -             kfree_rcu(selem, rcu);
> > +             call_rcu(&selem->rcu, bpf_selem_free_rcu);
>
> After another thought, bpf_obj_free_fields() does not need to go through the rcu
> gp, right?
>
> bpf_obj_free_fields() can be done just before the call_rcu_tasks_trace() and the
> call_rcu() here. In hashtab, bpf_obj_free_fields() is also done just before
> bpf_mem_cache_free().

Perhaps not. But the original code for hashtab prior to conversion to
use bpf_mem_cache actually freed timers and kptrs after waiting for a
complete RCU grace period for the kmalloc case. My main idea was to
try to delay it until the last point, where memory is returned for
reuse. Now that does not include a RCU grace period for hashtab
anymore because memory can be reused as soon as it is returned to
bpf_mem_cache. Same for array maps where update does the freeing.

bpf_obj_free_fields can possibly do a lot of work, try to acquire the
bpf_spin_lock in map value, etc. Even moreso now that we have lists
and rbtree that could be in map values, where they have to drain all
elements (which might have fields of their own). Not doing that in the
context of the program calling update or delete is usually better if
we have a choice, since it might introduce unexpected delays. Here we
are doing an RCU callback in all cases, so I think it's better to
delay freeing the fields and do it in RCU callback, since we are doing
call_rcu anyway.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux