Re: [RFC PATCH v9 10/11] selftests: Add a bpf fq qdisc to selftest

Amery Hung <ameryhung@xxxxxxxxx> · Fri, 19 Jul 2024 11:20:55 -0700

On Thu, Jul 18, 2024 at 6:54 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 7/14/24 10:51 AM, Amery Hung wrote:
> > +struct {
> > +     __uint(type, BPF_MAP_TYPE_ARRAY);
> > +     __type(key, __u32);
> > +     __type(value, struct fq_stashed_flow);
> > +     __uint(max_entries, NUM_QUEUE + 1);
> > +} fq_stashed_flows SEC(".maps");
> > +
> > +#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
> > +
> > +private(A) struct bpf_spin_lock fq_delayed_lock;
> > +private(A) struct bpf_rb_root fq_delayed __contains(fq_flow_node, rb_node);
> > +
> > +private(B) struct bpf_spin_lock fq_new_flows_lock;
> > +private(B) struct bpf_list_head fq_new_flows __contains(fq_flow_node, list_node);
> > +
> > +private(C) struct bpf_spin_lock fq_old_flows_lock;
> > +private(C) struct bpf_list_head fq_old_flows __contains(fq_flow_node, list_node);
>
> Potentially, multiple qdisc instances will content on these global locks. Do you
> think it will be an issue in setup like the root is mq and multiple fq(s) below
> the mq, like mq => (fq1, fq2, fq3...)?
>
> I guess it could be solved by storing them into the map's value and each fq
> instance uses its own lock and list/rb (?) to make it work like ".priv_size",
> but just more work is needed in ".init". Not necessary the top of the things to
> tackle/optimize for now though.
>

The examples in selftests indeed do not work well for mq as they share
a global queue.

Just thinking on a higher level. A solution could be introducing some
semantics to bpf to annotate maps or graphs to be private and backed
by per-qdisc privdata, so that users don't need to specify the qdisc
everytime when accessing private data. In addition, maybe the reset
mechanism can piggyback on this.

Though it is not the highest priority, I think the final code is
selftests should use independent queues if under mq. I will fix it in
some future revisions.

> [ ... ]
>
> > +SEC("struct_ops/bpf_fq_enqueue")
> > +int BPF_PROG(bpf_fq_enqueue, struct sk_buff *skb, struct Qdisc *sch,
> > +          struct bpf_sk_buff_ptr *to_free)
> > +{
> > +     struct fq_flow_node *flow = NULL, *flow_copy;
> > +     struct fq_stashed_flow *sflow;
> > +     u64 time_to_send, jiffies;
> > +     u32 hash, sk_hash;
> > +     struct skb_node *skbn;
> > +     bool connected;
> > +
> > +     if (fq_qlen >= q_plimit)
> > +             goto drop;
> > +
> > +     if (!skb->tstamp) {
> > +             time_to_send = ktime_cache = bpf_ktime_get_ns();
> > +     } else {
> > +             if (fq_packet_beyond_horizon(skb)) {
> > +                     ktime_cache = bpf_ktime_get_ns();
> > +                     if (fq_packet_beyond_horizon(skb)) {
> > +                             if (q_horizon_drop)
> > +                                     goto drop;
> > +
> > +                             skb->tstamp = ktime_cache + q_horizon;
> > +                     }
> > +             }
> > +             time_to_send = skb->tstamp;
> > +     }
> > +
> > +     if (fq_classify(skb, &hash, &sflow, &connected, &sk_hash) < 0)
> > +             goto drop;
> > +
> > +     flow = bpf_kptr_xchg(&sflow->flow, flow);
> > +     if (!flow)
> > +             goto drop;
> > +
> > +     if (hash != PRIO_QUEUE) {
> > +             if (connected && flow->socket_hash != sk_hash) {
>
> The commit message mentioned it does not handle the hash collision. Not a
> request for now, I just want to understand if you hit some issues.

IIRC, when I used hashmap for fq_stashed_flows, there were some false
negatives from the verifier. So I simplified the implementation by
rehashing flow hash to 10 bits and using an arraymap instead. Let me
fix this and see if there are fundamental issues.

>
> > +                     flow->credit = q_initial_quantum;
> > +                     flow->socket_hash = sk_hash;
> > +                     if (fq_flow_is_throttled(flow)) {
> > +                             /* mark the flow as undetached. The reference to the
> > +                              * throttled flow in fq_delayed will be removed later.
> > +                              */
> > +                             flow_copy = bpf_refcount_acquire(flow);
> > +                             flow_copy->age = 0;
> > +                             fq_flows_add_tail(&fq_old_flows, &fq_old_flows_lock, flow_copy);
> > +                     }
> > +                     flow->time_next_packet = 0ULL;
> > +             }
> > +
> > +             if (flow->qlen >= q_flow_plimit) {
> > +                     bpf_kptr_xchg_back(&sflow->flow, flow);
> > +                     goto drop;
> > +             }
> > +
> > +             if (fq_flow_is_detached(flow)) {
> > +                     if (connected)
> > +                             flow->socket_hash = sk_hash;
> > +
> > +                     flow_copy = bpf_refcount_acquire(flow);
> > +
> > +                     jiffies = bpf_jiffies64();
> > +                     if ((s64)(jiffies - (flow_copy->age + q_flow_refill_delay)) > 0) {
> > +                             if (flow_copy->credit < q_quantum)
> > +                                     flow_copy->credit = q_quantum;
> > +                     }
> > +                     flow_copy->age = 0;
> > +                     fq_flows_add_tail(&fq_new_flows, &fq_new_flows_lock, flow_copy);
> > +             }
> > +     }
> > +
> > +     skbn = bpf_obj_new(typeof(*skbn));
> > +     if (!skbn) {
> > +             bpf_kptr_xchg_back(&sflow->flow, flow)
> Please post the patch that makes the bpf_kptr_xchg() work. It is easier if I can
> try the selftests out.
>

The offlist RFC patchset from Dave Marchevsky is now in reply to the
cover letter for people interested to try out. I am also copying Dave
here.

Thanks for reviewing,
Amery