On Thu, Oct 27, 2022 at 4:58 AM shaozhengchao <shaozhengchao@xxxxxxxxxx> wrote: > > > > On 2022/10/25 1:13, Stanislav Fomichev wrote: > > On Sat, Oct 22, 2022 at 4:36 AM shaozhengchao <shaozhengchao@xxxxxxxxxx> wrote: > >> > >> > >> > >> On 2022/10/22 2:16, Stanislav Fomichev wrote: > >>> On Fri, Oct 21, 2022 at 12:25 AM shaozhengchao <shaozhengchao@xxxxxxxxxx> wrote: > >>>> > >>>> > >>>> > >>>> On 2022/10/21 1:45, Stanislav Fomichev wrote: > >>>>> On Wed, Oct 19, 2022 at 6:47 PM shaozhengchao <shaozhengchao@xxxxxxxxxx> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2022/10/18 0:36, Stanislav Fomichev wrote: > >>>>>>> On Sat, Oct 15, 2022 at 2:16 AM Zhengchao Shao <shaozhengchao@xxxxxxxxxx> wrote: > >>>>>>>> > >>>>>>>> As [0] see, bpf_prog_test_run_skb() should allow user space to forward > >>>>>>>> 14-bytes packet via BPF_PROG_RUN instead of dropping packet directly. > >>>>>>>> So fix it. > >>>>>>>> > >>>>>>>> 0: https://github.com/cilium/ebpf/commit/a38fb6b5a46ab3b5639ea4d421232a10013596c0 > >>>>>>>> > >>>>>>>> Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len") > >>>>>>>> Signed-off-by: Zhengchao Shao <shaozhengchao@xxxxxxxxxx> > >>>>>>>> --- > >>>>>>>> net/bpf/test_run.c | 6 +++--- > >>>>>>>> 1 file changed, 3 insertions(+), 3 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c > >>>>>>>> index 13d578ce2a09..aa1b49f19ca3 100644 > >>>>>>>> --- a/net/bpf/test_run.c > >>>>>>>> +++ b/net/bpf/test_run.c > >>>>>>>> @@ -979,9 +979,6 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb) > >>>>>>>> { > >>>>>>>> struct qdisc_skb_cb *cb = (struct qdisc_skb_cb *)skb->cb; > >>>>>>>> > >>>>>>>> - if (!skb->len) > >>>>>>>> - return -EINVAL; > >>>>>>>> - > >>>>>>>> if (!__skb) > >>>>>>>> return 0; > >>>>>>>> > >>>>>>>> @@ -1102,6 +1099,9 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr, > >>>>>>>> if (IS_ERR(data)) > >>>>>>>> return PTR_ERR(data); > >>>>>>>> > >>>>>>>> + if (size == ETH_HLEN) > >>>>>>>> + is_l2 = true; > >>>>>>>> + > >>>>>>> > >>>>>>> Don't think this will work? That is_l2 is there to expose proper l2/l3 > >>>>>>> skb for specific hooks; we can't suddenly start exposing l2 headers to > >>>>>>> the hooks that don't expect it. > >>>>>>> Does it make sense to start with a small reproducer that triggers the > >>>>>>> issue first? We can have a couple of cases for > >>>>>>> len=0/ETH_HLEN-1/ETH_HLEN+1 and trigger them from the bpf program that > >>>>>>> redirects to different devices (to trigger dev_is_mac_header_xmit). > >>>>>>> > >>>>>>> > >>>>>> Hi Stanislav: > >>>>>> Thank you for your review. Is_l2 is the flag of a specific > >>>>>> hook. Therefore, do you mean that if skb->len is equal to 0, just > >>>>>> add the length back? > >>>>> > >>>>> Not sure I understand your question. All I'm saying is - you can't > >>>>> flip that flag arbitrarily. This flag depends on the attach point that > >>>>> you're running the prog against. Some attach points expect packets > >>>>> with l2, some expect packets without l2. > >>>>> > >>>>> What about starting with a small reproducer? Does it make sense to > >>>>> create a small selftest that adds net namespace + fq_codel + > >>>>> bpf_prog_test run and do redirect ingress/egress with len > >>>>> 0/1...tcphdr? Because I'm not sure I 100% understand whether it's only > >>>>> len=0 that's problematic or some other combination as well? > >>>>> > >>>> yes, only skb->len = 0 will cause null-ptr-deref issue. > >>>> The following is the process of triggering the problem: > >>>> enqueue a skb: > >>>> fq_codel_enqueue() > >>>> ... > >>>> idx = fq_codel_classify() --->if idx != 0 > >>>> flow = &q->flows[idx]; > >>>> flow_queue_add(flow, skb); --->add skb to flow[idex] > >>>> q->backlogs[idx] += qdisc_pkt_len(skb); --->backlogs = 0 > >>>> ... > >>>> fq_codel_drop() --->set sch->limit = 0, always > >>>> drop packets > >>>> ... > >>>> idx = i --->becuase backlogs in every > >>>> flows is 0, so idx = 0 > >>>> ... > >>>> flow = &q->flows[idx]; --->get idx=0 flow > >>>> ... > >>>> dequeue_head() > >>>> skb = flow->head; --->flow->head = NULL > >>>> flow->head = skb->next; --->cause null-ptr-deref > >>>> So, if skb->len !=0,fq_codel_drop() could get the correct idx, and > >>>> then skb!=NULL, it will be OK. > >>>> Maybe, I will fix it in fq_codel. > >>> > >>> I think the consensus here is that the stack, in general, doesn't > >>> expect the packets like this. So there are probably more broken things > >>> besides fq_codel. Thus, it's better if we remove the ability to > >>> generate them from the bpf side instead of fixing the individual users > >>> like fq_codel. > >>> > >>>> But, as I know, skb->len = 0 is just invalid packet. I prefer to add the > >>>> length back, like bellow: > >>>> if (is_l2 || !skb->len) > >>>> __skb_push(skb, hh_len); > >>>> is it OK? > >>> > >>> Probably not? > >>> > >>> Looking at the original syzkaller report, prog_type is > >>> BPF_PROG_TYPE_LWT_XMIT which does expect a packet without l2 header. > >>> Can we do something like: > >>> > >>> if (!is_l2 && !skb->len) { > >>> // append some dummy byte to the skb ? > >>> } > >>> > >>> > >> I pad one byte, and test OK. > >> if (!is_l2 && !skb->len) > >> __skb_push(skb, 1); > >> > >> Does it look OK to you? > > > > Nope, this will eat a byte out of the l2 header. We need to skb_put > > and make sure we allocate enough to make that skb_put succeed. > > > > But stepping back a bit: it feels like it's all unnecessary? The only > > valid use-case of this is probing for the BPF_PROG_TEST_RUN as cilium > > does. This is mostly about testing, so fixing it in the users seems > > fair? No real production code is expected to generate these zero-len > > packets. Or are we concerned that this will leak into stable kernels? > > > > I feel like we are trying to add more complexity here for no apparent reason. > > > I agree with you. users should make sure the correct skb len and > configurations are passed into kernel. Incorrect configurations should > be discarded to ensure kernel stability. > > Lorenz, Can you modify the user-mode test code? Lorenz already fixed it for Cilium. I think the discussion here is around other potential users out there. Let's wait for them to appear if it is indeed a problem?