Re: [PATCH bpf-next v3 03/12] bpf: XDP metadata RX kfuncs

Stanislav Fomichev <sdf@xxxxxxxxxx> · Thu, 8 Dec 2022 15:45:49 -0800

On Thu, Dec 8, 2022 at 2:53 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 12/8/22 11:07 AM, Stanislav Fomichev wrote:
> >>> @@ -102,11 +112,25 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
> >>>        if (err)
> >>>                goto err_maybe_put;
> >>>
> >>> +     prog->aux->offload_requested = !(attr->prog_flags & BPF_F_XDP_HAS_METADATA);
> >>> +
> >>
> >> If I read the set correctly, bpf prog can either use metadata kfunc or offload
> >> but not both. It is fine to start with only supporting metadata kfunc when there
> >> is no offload but will be useful to understand the reason. I assume an offloaded
> >> bpf prog should still be able to call the bpf helpers like adjust_head/tail and
> >> the same should go for any kfunc?
> >
> > Yes, I'm assuming there should be some work on the offloaded device
> > drivers to support metadata kfuncs.
> > Offloaded kfuncs, in general, seem hard (how do we call kernel func
> > from the device-offloaded prog?); so refusing kfuncs early for the
> > offloaded case seems fair for now?
>
> Ah, ok.  I was actually thinking the HW offloaded prog can just use the software
> ndo_* kfunc (like other bpf-helpers).  From skimming some
> bpf_prog_offload_ops:prepare implementation, I think you are right and it seems
> BPF_PSEUDO_KFUNC_CALL has not been recognized yet.
>
> [ ... ]
>
> >>> @@ -226,10 +263,17 @@ static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
> >>>
> >>>    void bpf_prog_offload_destroy(struct bpf_prog *prog)
> >>>    {
> >>> +     struct net_device *netdev = NULL;
> >>> +
> >>>        down_write(&bpf_devs_lock);
> >>> -     if (prog->aux->offload)
> >>> +     if (prog->aux->offload) {
> >>> +             netdev = prog->aux->offload->netdev;
> >>>                __bpf_prog_offload_destroy(prog);
> >>> +     }
> >>>        up_write(&bpf_devs_lock);
> >>> +
> >>> +     if (netdev)
> >>
> >> May be I have missed a refcnt or lock somewhere.  Is it possible that netdev may
> >> have been freed?
> >
> > Yeah, with the offload framework, there are no refcnts. We put an
> > "offloaded" device into a separate hashtable (protected by
> > rtnl/semaphore).
> > maybe_remove_bound_netdev will re-grab the locks (due to ordering:
> > rtnl->bpf_devs_lock) and remove the device from the hashtable if it's
> > still there.
> > At least this is how, I think, it should work; LMK if something is
> > still fishy here...
> >
> > Or is the concern here that somebody might allocate new netdev reusing
> > the same address? I think I have enough checks in
> > maybe_remove_bound_netdev to guard against that. Or, at least, to make
> > it safe :-)
>
> Race is ok because ondev needs to be removed anyway when '!ondev->offdev &&
> list_empty(&ondev->progs)'?  hmmm... tricky, please add a comment. :)
>
> Why it cannot be done together in the bpf_devs_lock above?  The above cannot
> take an extra rtnl_lock before bpf_devs_lock?

Hm, let's take an extra rtln to avoid this complexity, agree. I guess
I was trying to avoid taking it, but this path is still 'dev_bound ==
true' protected, so shouldn't affect the rest of the progs.