Re: [PATCH v2 bpf-next 1/3] bpf: Implement batching in UDP iterator

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Wed, 1 Mar 2023 22:43:23 -0800

On 2/28/23 6:40 PM, Aditi Ghag wrote:

+
+		if (hlist_empty(&hslot->head))
+			continue;
+
+		spin_lock_bh(&hslot->lock);
+		sk_for_each(sk, &hslot->head) {
+			if (seq_sk_match(seq, sk)) {
+				if (first) {
+					first_sk = sk;
+					first = false;
+				}
+				if (iter->end_sk < iter->max_sk) {
+					sock_hold(sk);
+					iter->batch[iter->end_sk++] = sk;
+				}
+				bucket_sks++;
+			}
+		}
+		spin_unlock_bh(&hslot->lock);
+		if (first_sk)
+			break;
+	}
+
+	/* All done: no batch made. */
+	if (!first_sk)
+		return NULL;

I think first_sk and bucket_sks need to be reset on the "again" case also?

If bpf_iter_udp_seq_stop() is called before a batch has been fully processed by the bpf prog in ".show", how does the next bpf_iter_udp_seq_start() continue from where it left off? The bpf_tcp_iter remembers the bucket and the offset-in-this-bucket. I think bpf_udp_iter can do something similar.

Hmm, I suppose you are referring to the `tcp_seek_last_pos` logic? This was the [1] commit that added the optimization in v2.6, but only for TCP. Extending the same logic for UDP is out of the scope of this PR? Although reading the [1] commit description, the latency issue seems to be specific to the file based iterators, no? Of course, BPF iterators are quite new, and I'm not sure if we have the same "slowness" reported in that commit. Having said that, do we expect users to start from where they previously left off by stopping BPF iterators? Regardless, I'll do some testing to ensure that we at least don't crash.

It is about ensuring the bpf-udp-iter makes forward progress, although speeding 
up is another plus. The iteration may have to be stopped (ie. 
bpf_iter_udp_seq_stop) for other reasons. The bpf-(udp|tcp|unix)-iter is not 
only for doing bpf_setsockopt or bpf_sock_destroy. A major use case is to 
bpf_seq_printf. eg. when seq_has_overflowed() in bpf_seq_read, 
bpf_iter_udp_seq_stop will be called.

If I read this patch correctly, bpf_iter_udp_seq_start() always starts from the 
beginning of the current bucket. The bpf-iter cannot make forward progress if 
the bpf_iter_udp_seq_stop is called at the middle of the bucket.

The "resume" is currently done by udp_get_idx(, *pos..) but is taken away from 
bpf-udp-iter in this patch. udp_get_idx(...) starts counting from the very first 
bucket and could potentially be reused here. However, I believe this patch is 
easier to just resume from the current bucket. Everything is there, all it needs 
is to remember its previous position/offset of the bucket.

[1] https://github.com/torvalds/linux/commit/a8b690f98baf9fb1902b8eeab801351ea603fa3a