Re: How does the Kernel decide which Umem frame to choose for the next packet?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 18, 2020 at 11:17 AM Gaul, Maximilian
<maximilian.gaul@xxxxxx> wrote:
>
> > User-space decides this by what frames it enters into the fill ring.
> > Kernel-space uses the frames in order from that ring.
> >
> > /Magnus
>
> Thank you for your reply Magnus,
>
> I am sorry to ask again but I am not so sure when this happens.
> So I first check my socket RX-ring for new packets:
>
>                 xsk_ring_cons__peek(&xsk_socket->rx, 1024, &idx_rx)
>
> which looks like this:
>
>                 static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons,
>                                                          size_t nb, __u32 *idx)
>                 {
>                         size_t entries = xsk_cons_nb_avail(cons, nb);
>
>                         if (entries > 0) {
>                                 /* Make sure we do not speculatively read the data before
>                                  * we have received the packet buffers from the ring.
>                                  */
>                                 libbpf_smp_rmb();
>
>                                 *idx = cons->cached_cons;
>                                 cons->cached_cons += entries;
>                         }
>
>                         return entries;
>                 }
>
> where `idx_rx` is the starting position of descriptors for the new packets in the RX-ring.
>
> My first question here is: How can there already be descriptors of packets in my RX-ring if I didn't enter any frames into the fill ring of the umem yet?
> So I assume libbpf did this for me already?

Yes, that is correct.

> After this call I know how many packets are waiting. So I reserve exactly as many Umem frames:
>
>                 xsk_ring_prod__reserve(&umem_info->fq, rx_rcvd_amnt, &idx_fq);
>
> which looks like this:
>
>                 static inline size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod,
>                                                                 size_t nb, __u32 *idx)
>                 {
>                         if (xsk_prod_nb_free(prod, nb) < nb)
>                                 return 0;
>
>                         *idx = prod->cached_prod;
>                         prod->cached_prod += nb;
>
>                         return nb;
>                 }
>
> But what am I exactly reserving here? How can I reserve anything from the Umem without telling it the RX-ring of my socket?

You are reserving descriptor slots in a producer ring.

> After  this, I extract the RX-ring packet descriptors, starting at `idx_rx`:
>
>                 const struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk_socket->rx, idx_rx + i);
>
> I am also not entirely certain with the zero-copy aspect of AF-XDP. As far as I know the NIC writes incoming packets via DMA directly into system memory. But this time system memory means the Umem area - right? Where with non-zero-copy this would be any position in memory and the Kernel first has to copy the packets into the Umem area?

In zero-copy mode, the NIC DMA:s the packet straight into the umem, so
they are immediately seen by the user space process.

> I am also a bit confused what the size of a RX-queue means in this context. Assuming the output of ethtool:
>
>                 $ ethtool -g eth20
>                 Ring parameters for eth20:
>                 Pre-set maximums:
>                 RX:             8192
>                 RX Mini:        0
>                 RX Jumbo:       0
>                 TX:             8192
>                 Current hardware settings:
>                 RX:             1024
>                 RX Mini:        0
>                 RX Jumbo:       0
>                 TX:             1024
>
> Does this mean that at the moment my NIC can store 1024 incoming packets inside its own memory?

The NIC does not have its own memory. This just means that their can
be 1024 packets that will be processed by the NIC or have been
processed by the NIC but not handled by the driver. Nothing you need
to care about unless you are performance optimizing, or writing a
driver of course :-).

> So there is no connection between the RX-queue size of the NIC and the Umem area?

Correct.

/Magnus

> Sorry for this wall of text. Maybe you can answer a few of my questions, I hope they are not too confusing.
>
> Thank you so much
>
> Max




[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux