> > Thanks very much! > > You remind me, I always started the pktgen script first and then ran > > the xdp2 program in the previous tests. So I saw the transmit speed of > > the generator was always greater than the speed of XDP_TX when I > > stopped the script. But actually, the real-time transmit speed of the > > generator was degraded to as equal to the speed of XDP_TX. > > > > Good that we finally found the root-cause, that explains why it seems our > code changes didn't have any effect. The generator gets affected and > slowed down due to the traffic that is bounced back to it. (I tried to hint this > earlier with the Ethernet Flow-Control settings). > > > So I turned off the rx function of the generator in case of increasing > > the CPU loading of the generator due to the returned traffic from xdp2. > > How did you turned off the rx function of the generator? > (I a couple of tricks I use) > Actually, I didn't really disable the rx function of the generator, I just made the generator hardware automatically discard the returned traffic from xdp2. So I utilized the MAC filter feature of the hardware and did some modification to the pktgen script to make the SMAC of the packet is different from the MAC address of the generator. > > And I tested > > the performance again. Below are the results. > > > > Result 1: current method > > root@imx8mpevk:~# ./xdp2 eth0 > > proto 17: 326539 pkt/s > > proto 17: 326464 pkt/s > > proto 17: 326528 pkt/s > > proto 17: 326465 pkt/s > > proto 17: 326550 pkt/s > > > > Result 2: sync_dma_len method > > root@imx8mpevk:~# ./xdp2 eth0 > > proto 17: 353918 pkt/s > > proto 17: 352923 pkt/s > > proto 17: 353900 pkt/s > > proto 17: 352672 pkt/s > > proto 17: 353912 pkt/s > > > > This looks more promising: > ((353912/326550)-1)*100 = 8.37% faster. > > Or gaining/saving approx 236 nanosec per packet > ((1/326550-1/353912)*10^9). > > > Note: the speed of the generator is about 935397pps. > > > > Compared result 1 with result 2. The "sync_dma_len" method actually > > improves the performance of XDP_TX, so the conclusion from the previous > tests is *incorrect*. > > I'm so sorry for that. :( > > > > I'm happy that we finally found the root-cause. > Thanks for doing all the requested tests I asked for. > > > In addition, I also tried the "dma_sync_len" + not use > > xdp_convert_buff_to_frame() method, the performance has been further > improved. Below is the result. > > > > Result 3: sync_dma_len + not use xdp_convert_buff_to_frame() method > > root@imx8mpevk:~# ./xdp2 eth0 > > proto 17: 369261 pkt/s > > proto 17: 369267 pkt/s > > proto 17: 369206 pkt/s > > proto 17: 369214 pkt/s > > proto 17: 369126 pkt/s > > > > Therefore, I'm intend to use the "dma_sync_len"+ not use > > xdp_convert_buff_to_frame() method in the V5 patch. Thank you again, > > Jesper and Jakub. You really helped me a lot. :) > > > > I suggest, that V5 patch still use xdp_convert_buff_to_frame(), and then you > send followup patch (or as 2/2 patch) that remove the use of > xdp_convert_buff_to_frame() for XDP_TX. This way it is easier to keep track > of the changes and improvements. > Okay, I will do it. > I would be very interested in knowing if the MMIO test change after this > correction to the testlab/generator. > The performance is significantly improved as you expected, but as I explained before, I'm not sure whether there are the potential risks other than increase latency. So I'm not going to modify it at the moment. Below is the result that I changed the logic to do a MMIO-write on rx-BDR and tx-BDR respectively in the end of the NPI callback. root@imx8mpevk:~# ./xdp2 eth0 proto 17: 436020 pkt/s proto 17: 436167 pkt/s proto 17: 434205 pkt/s proto 17: 436140 pkt/s proto 17: 436115 pkt/s