> On Jan 12, 2024, at 11:05 AM, Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > From: Martin KaFai Lau <martin.lau@xxxxxxxxxx> > > There is a bug in the bpf_iter_udp_batch() function that stops > the userspace from making forward progress. > > The case that triggers the bug is the userspace passed in > a very small read buffer. When the bpf prog does bpf_seq_printf, > the userspace read buffer is not enough to capture the whole bucket. > > When the read buffer is not large enough, the kernel will remember > the offset of the bucket in iter->offset such that the next userspace > read() can continue from where it left off. > > The kernel will skip the number (== "iter->offset") of sockets in > the next read(). However, the code directly decrements the > "--iter->offset". This is incorrect because the next read() may > not consume the whole bucket either and then the next-next read() > will start from offset 0. The net effect is the userspace will > keep reading from the beginning of a bucket and the process will > never finish. "iter->offset" must always go forward until the > whole bucket is consumed. > > This patch fixes it by using a local variable "resume_offset" > and "resume_bucket". "iter->offset" is always reset to 0 before > it may be used. "iter->offset" will be advanced to the > "resume_offset" when it continues from the "resume_bucket" (i.e. > "state->bucket == resume_bucket"). This brings it closer to > the bpf_iter_tcp's offset handling which does not suffer > the same bug. > > Cc: Aditi Ghag <aditi.ghag@xxxxxxxxxxxxx> > Fixes: c96dac8d369f ("bpf: udp: Implement batching for sockets iterator") > Acked-by: Yonghong Song <yonghong.song@xxxxxxxxx> > Signed-off-by: Martin KaFai Lau <martin.lau@xxxxxxxxxx> Reviewed-by: Aditi Ghag <aditi.ghag@xxxxxxxxxxxxx> Thanks! > --- > net/ipv4/udp.c | 21 ++++++++++----------- > 1 file changed, 10 insertions(+), 11 deletions(-) > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index 978b83d3c094..04c534a9ef89 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -3137,16 +3137,18 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) > struct bpf_udp_iter_state *iter = seq->private; > struct udp_iter_state *state = &iter->state; > struct net *net = seq_file_net(seq); > + int resume_bucket, resume_offset; > struct udp_table *udptable; > unsigned int batch_sks = 0; > bool resized = false; > struct sock *sk; > > + resume_bucket = state->bucket; > + resume_offset = iter->offset; > + > /* The current batch is done, so advance the bucket. */ > - if (iter->st_bucket_done) { > + if (iter->st_bucket_done) > state->bucket++; > - iter->offset = 0; > - } > > udptable = udp_get_table_seq(seq, net); > > @@ -3166,19 +3168,19 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) > for (; state->bucket <= udptable->mask; state->bucket++) { > struct udp_hslot *hslot2 = &udptable->hash2[state->bucket]; > > - if (hlist_empty(&hslot2->head)) { > - iter->offset = 0; > + if (hlist_empty(&hslot2->head)) > continue; > - } > > + iter->offset = 0; > spin_lock_bh(&hslot2->lock); > udp_portaddr_for_each_entry(sk, &hslot2->head) { > if (seq_sk_match(seq, sk)) { > /* Resume from the last iterated socket at the > * offset in the bucket before iterator was stopped. > */ > - if (iter->offset) { > - --iter->offset; > + if (state->bucket == resume_bucket && > + iter->offset < resume_offset) { I like this invariant of ensuring that the batching and resume operations are performed for the same bucket under consideration. > + ++iter->offset; > continue; > } > if (iter->end_sk < iter->max_sk) { > @@ -3192,9 +3194,6 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) > > if (iter->end_sk) > break; > - > - /* Reset the current bucket's offset before moving to the next bucket. */ > - iter->offset = 0; > } > > /* All done: no batch made. */ > -- > 2.34.1 >