The 11/17/2022 16:31, Alexander Lobakin wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > From: Horatiu Vultur <horatiu.vultur@xxxxxxxxxxxxx> > Date: Wed, 16 Nov 2022 21:55:57 +0100 > > > The 11/16/2022 16:34, Alexander Lobakin wrote: > > > > > > From: Horatiu Vultur <horatiu.vultur@xxxxxxxxxxxxx> > > > Date: Tue, 15 Nov 2022 22:44:55 +0100 > > > > Hi Olek, > > Hi! > > > > For %XDP_REDIRECT, as you don't know the source of the XDP frame, > > > > Why I don't know the source? > > Will it not be from an RX page that is allocated by Page Pool? > > Imagine some NIC which does not use Page Pool, for example, it does > its own page allocation / splitting / recycling techniques, gets > %XDP_REDIRECT when running XDP prog on Rx. devmap says it must > redirect the frame to your NIC. > Then, your ::ndo_xdp_xmit() will be run on a frame/page not > belonging to any Page Pool. > The example can be any of Intel drivers (there are plans to switch > at least i40e and ice to Page Pool, but they're always deeply in > the backlogs (clownface)). Silly me, I was always thinking and trying only from one port of lan966x to another port of lan966x. Of course it can come from other NICs. > > > > > > you need to unmap it (as it was previously mapped in > > > ::ndo_xdp_xmit()), plus call xdp_return_frame{,_bulk} to free the > > > XDP frame. Note that _rx_napi() variant is not applicable here. > > > > > > That description might be confusing, so you can take a look at the > > > already existing code[0] to get the idea. I think this piece shows > > > the expected logics rather well. > > > > I think you forgot to write the link to the code. > > I looked also at different drivers but I didn't figure it out why the > > frame needed to be mapped and where is happening that. > > Ooof, really. Pls look at the end of this reply :D > On ::ndo_xdp_xmit(), as I explained above, you can receive a frame > from any driver or BPF core code (such as cpumap), and BPF prog > there could be run on buffer of any kind: Page Pool page, just a > page, a kmalloc() chunk and so on. > > So, in the code[0], you can see the following set of operations: > > * DMA unmap in all cases excluding frame coming from %XDP_TX (then > it was only synced); > * updating statistics and freeing skb for skb cases; > * xdp_return_frame_rx_napi() for %XDP_TX cases; > * xdp_return_frame_bulk() for ::ndo_xdp_xmit() cases. Thanks for a detail explanation and for the link :D I will update all this in the next version. > > > > + ifh = page_address(page) + XDP_PACKET_HEADROOM; > > > + memset(ifh, 0x0, sizeof(__be32) * IFH_LEN); > > > + lan966x_ifh_set_bypass(ifh, 1); > > > + lan966x_ifh_set_port(ifh, BIT_ULL(port->chip_port)); > > > + > > > + dma_addr = page_pool_get_dma_addr(page); > > > + dma_sync_single_for_device(lan966x->dev, dma_addr + XDP_PACKET_HEADROOM, > > > + xdpf->len + IFH_LEN_BYTES, > > > + DMA_TO_DEVICE); > > > > > > Also not correct. This page was mapped with %DMA_FROM_DEVICE in the > > > Rx code, now you sync it for the opposite. > > > Most drivers in case of XDP enabled create Page Pools with ::dma_dir > > > set to %DMA_BIDIRECTIONAL. Now you would need only to sync it here > > > with the same direction (bidir) and that's it. > > > > That is a really good catch! > > I was wondering why the things were working when I tested this. Because > > definitely, I can see the right behaviour. > > The reasons can be: > > 1) your platform might have a DMA coherence engine, so that all > those DMA sync calls are no-ops; > 2) on your platform, DMA writeback (TO_DEVICE) and DMA invalidate > (FROM_DEVICE) invoke the same operation/instruction. Some > hardware is designed that way, that any DMA sync is in fact a > bidir synchronization; > 3) if there were no frame modification from the kernel, e.g. you > received it and immediately sent, cache was not polluted with > some pending modifications, so there was no work for writeback; > 4) probably something else I might've missed. > > > > > > > > > + > > > + /* Setup next dcb */ > > > + lan966x_fdma_tx_setup_dcb(tx, next_to_use, xdpf->len + IFH_LEN_BYTES, > > > + dma_addr + XDP_PACKET_HEADROOM); > > > + > > > + /* Fill up the buffer */ > > > + next_dcb_buf = &tx->dcbs_buf[next_to_use]; > > > + next_dcb_buf->skb = NULL; > > > + next_dcb_buf->page = page; > > > + next_dcb_buf->len = xdpf->len + IFH_LEN_BYTES; > > > + next_dcb_buf->dma_addr = dma_addr; > > > + next_dcb_buf->used = true; > > > + next_dcb_buf->ptp = false; > > > + next_dcb_buf->dev = port->dev; > > > + > > > + /* Start the transmission */ > > > + lan966x_fdma_tx_start(tx, next_to_use); > > > + > > > +out: > > > + spin_unlock(&lan966x->tx_lock); > > > + > > > + return ret; > > > +} > > > + > > > int lan966x_fdma_xmit(struct sk_buff *skb, __be32 *ifh, struct net_device *dev) > > > { > > > struct lan966x_port *port = netdev_priv(dev); > > > @@ -709,6 +776,7 @@ int lan966x_fdma_xmit(struct sk_buff *skb, __be32 *ifh, struct net_device *dev) > > > /* Fill up the buffer */ > > > next_dcb_buf = &tx->dcbs_buf[next_to_use]; > > > next_dcb_buf->skb = skb; > > > + next_dcb_buf->page = NULL; > > > next_dcb_buf->len = skb->len; > > > next_dcb_buf->dma_addr = dma_addr; > > > next_dcb_buf->used = true; > > > > > > [...] > > > > > > -- > > > 2.38.0 > > > > > > Thanks, > > > Olek > > > > -- > > /Horatiu > > [0] https://elixir.bootlin.com/linux/v6.1-rc5/source/drivers/net/ethernet/marvell/mvneta.c#L1882 > > Thanks, > Olek -- /Horatiu