On Tue, Nov 12, 2024, at 9:43 AM, Alexander Lobakin wrote: > From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> > Date: Tue, 22 Oct 2024 17:51:43 +0200 > >> From: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> >> Date: Wed, 9 Oct 2024 14:50:42 +0200 >> >>> From: Lorenzo Bianconi <lorenzo@xxxxxxxxxx> >>> Date: Wed, 9 Oct 2024 14:47:58 +0200 >>> >>>>> From: Lorenzo Bianconi <lorenzo@xxxxxxxxxx> >>>>> Date: Wed, 9 Oct 2024 12:46:00 +0200 >>>>> >>>>>>> Hi Lorenzo, >>>>>>> >>>>>>> On Mon, Sep 16, 2024 at 12:13:42PM GMT, Lorenzo Bianconi wrote: >>>>>>>> Add GRO support to cpumap codebase moving the cpu_map_entry kthread to a >>>>>>>> NAPI-kthread pinned on the selected cpu. >>>>>>>> >>>>>>>> Changes in rfc v2: >>>>>>>> - get rid of dummy netdev dependency >>>>>>>> >>>>>>>> Lorenzo Bianconi (3): >>>>>>>> net: Add napi_init_for_gro routine >>>>>>>> net: add napi_threaded_poll to netdevice.h >>>>>>>> bpf: cpumap: Add gro support >>>>>>>> >>>>>>>> include/linux/netdevice.h | 3 + >>>>>>>> kernel/bpf/cpumap.c | 123 ++++++++++++++++---------------------- >>>>>>>> net/core/dev.c | 27 ++++++--- >>>>>>>> 3 files changed, 73 insertions(+), 80 deletions(-) >>>>>>>> >>>>>>>> -- >>>>>>>> 2.46.0 >>>>>>>> >>>>>>> >>>>>>> Sorry about the long delay - finally caught up to everything after >>>>>>> conferences. >>>>>>> >>>>>>> I re-ran my synthetic tests (including baseline). v2 is somehow showing >>>>>>> 2x bigger gains than v1 (~30% vs ~14%) for tcp_stream. Again, the only >>>>>>> variable I changed is kernel version - steering prog is active for both. >>>>>>> >>>>>>> >>>>>>> Baseline (again) >>>>>>> >>>>>>> ./tcp_rr -c -H $TASK_IP -p 50,90,99 -T4 -F8 -l30 ./tcp_stream -c -H $TASK_IP -T8 -F16 -l30 >>>>>>> >>>>>>> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >>>>>>> Run 1 2560252 0.00009087 0.00010495 0.00011647 Run 1 15479.31 >>>>>>> Run 2 2665517 0.00008575 0.00010239 0.00013311 Run 2 15162.48 >>>>>>> Run 3 2755939 0.00008191 0.00010367 0.00012287 Run 3 14709.04 >>>>>>> Run 4 2595680 0.00008575 0.00011263 0.00012671 Run 4 15373.06 >>>>>>> Run 5 2841865 0.00007999 0.00009471 0.00012799 Run 5 15234.91 >>>>>>> Average 2683850.6 0.000084854 0.00010367 0.00012543 Average 15191.76 >>>>>>> >>>>>>> cpumap NAPI patches v2 >>>>>>> >>>>>>> Transactions Latency P50 (s) Latency P90 (s) Latency P99 (s) Throughput (Mbit/s) >>>>>>> Run 1 2577838 0.00008575 0.00012031 0.00013695 Run 1 19914.56 >>>>>>> Run 2 2729237 0.00007551 0.00013311 0.00017663 Run 2 20140.92 >>>>>>> Run 3 2689442 0.00008319 0.00010495 0.00013311 Run 3 19887.48 >>>>>>> Run 4 2862366 0.00008127 0.00009471 0.00010623 Run 4 19374.49 >>>>>>> Run 5 2700538 0.00008319 0.00010367 0.00012799 Run 5 19784.49 >>>>>>> Average 2711884.2 0.000081782 0.00011135 0.000136182 Average 19820.388 >>>>>>> Delta 1.04% -3.62% 7.41% 8.57% 30.47% >>>>>>> >>>>>>> Thanks, >>>>>>> Daniel >>>>>> >>>>>> Hi Daniel, >>>>>> >>>>>> cool, thx for testing it. >>>>>> >>>>>> @Olek: how do we want to proceed on it? Are you still working on it or do you want me >>>>>> to send a regular patch for it? >>>>> >>>>> Hi, >>>>> >>>>> I had a small vacation, sorry. I'm starting working on it again today. >>>> >>>> ack, no worries. Are you going to rebase the other patches on top of it >>>> or are you going to try a different approach? >>> >>> I'll try the approach without NAPI as Kuba asks and let Daniel test it, >>> then we'll see. >> >> For now, I have the same results without NAPI as with your series, so >> I'll push it soon and let Daniel test. >> >> (I simply decoupled GRO and NAPI and used the former in cpumap, but the >> kthread logic didn't change) >> >>> >>> BTW I'm curious how he got this boost on v2, from what I see you didn't >>> change the implementation that much? > > Hi Daniel, > > Sorry for the delay. Please test [0]. > > [0] https://github.com/alobakin/linux/commits/cpumap-old > > Thanks, > Olek Ack. Will do probably early next week.