AW: How does the Kernel decide which Umem frame to choose for the next packet?

"Gaul, Maximilian" <maximilian.gaul@xxxxxx> · Mon, 18 May 2020 09:17:06 +0000

> User-space decides this by what frames it enters into the fill ring.
> Kernel-space uses the frames in order from that ring.
> 
> /Magnus

Thank you for your reply Magnus,

I am sorry to ask again but I am not so sure when this happens.
So I first check my socket RX-ring for new packets:

		xsk_ring_cons__peek(&xsk_socket->rx, 1024, &idx_rx)

which looks like this:

		static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons,
							 size_t nb, __u32 *idx)
		{
			size_t entries = xsk_cons_nb_avail(cons, nb);

			if (entries > 0) {
				/* Make sure we do not speculatively read the data before
				 * we have received the packet buffers from the ring.
				 */
				libbpf_smp_rmb();

				*idx = cons->cached_cons;
				cons->cached_cons += entries;
			}

			return entries;
		}

where `idx_rx` is the starting position of descriptors for the new packets in the RX-ring.

My first question here is: How can there already be descriptors of packets in my RX-ring if I didn't enter any frames into the fill ring of the umem yet?
So I assume libbpf did this for me already?

After this call I know how many packets are waiting. So I reserve exactly as many Umem frames:

		xsk_ring_prod__reserve(&umem_info->fq, rx_rcvd_amnt, &idx_fq);

which looks like this:

		static inline size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod,
								size_t nb, __u32 *idx)
		{
			if (xsk_prod_nb_free(prod, nb) < nb)
				return 0;

			*idx = prod->cached_prod;
			prod->cached_prod += nb;

			return nb;
		}

But what am I exactly reserving here? How can I reserve anything from the Umem without telling it the RX-ring of my socket?

After  this, I extract the RX-ring packet descriptors, starting at `idx_rx`:

		const struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk_socket->rx, idx_rx + i);

I am also not entirely certain with the zero-copy aspect of AF-XDP. As far as I know the NIC writes incoming packets via DMA directly into system memory. But this time system memory means the Umem area - right? Where with non-zero-copy this would be any position in memory and the Kernel first has to copy the packets into the Umem area?

I am also a bit confused what the size of a RX-queue means in this context. Assuming the output of ethtool:

		$ ethtool -g eth20
		Ring parameters for eth20:
		Pre-set maximums:
		RX:             8192
		RX Mini:        0
		RX Jumbo:       0
		TX:             8192
		Current hardware settings:
		RX:             1024
		RX Mini:        0
		RX Jumbo:       0
		TX:             1024

Does this mean that at the moment my NIC can store 1024 incoming packets inside its own memory? So there is no connection between the RX-queue size of the NIC and the Umem area?

Sorry for this wall of text. Maybe you can answer a few of my questions, I hope they are not too confusing.

Thank you so much

Max