Paolo Abeni wrote: > On Tue, 2024-03-26 at 18:25 +0100, Richard Gobert wrote: >> Paolo Abeni wrote: >>> Hi, >>> >>> On Tue, 2024-03-26 at 16:02 +0100, Richard Gobert wrote: >>>> This patch is meaningful by itself - removing checks against non-relevant >>>> packets and making the flush/flush_id checks in a single place. >>> >>> I'm personally not sure this patch is a win. The code churn is >>> significant. I understand this is for performance's sake, but I don't >>> see the benefit??? >>> >> >> Could you clarify what do you mean by code churn? > > The diffstat of this patch is not negligible and touches very sensitive > areas. > diff mainly touches flush/flush_id/is_atomic, the new code should be less complex. I agree this is sensitive as it is part of core GRO - I checked all relevant flows manually, but I can also create more tests and ensure that logic remains the same. >>> he changelog shows that perf reports slightly lower figures for >>> inet_gro_receive(). That is expected, as this patch move code out of >>> such functio. What about inet_gro_flush()/tcp_gro_receive() where such >>> code is moved? >>> >> >> Please consider the following 2 common scenarios: >> >> 1) Multiple packets in the GRO bucket - the common case with multiple >> packets in the bucket (i.e. running super_netperf TCP_STREAM) - each layer >> executes a for loop - going over each packet in the bucket. Specifically, >> L3 gro_receive loops over the bucket making flush,flush_id,is_atomic >> checks. > > Only for packets with the same rx hash. > Right, but there are only 8 GRO buckets, so a collision can still happen on multiple concurrent streams. >> For most packets in the bucket, these checks are not >> relevant. (possibly also dirtying cache lines with non-relevant p >> packets). Removing code in the for loop for this case is significant. >> >> 2) UDP/TCP streams which do not coalesce in GRO. This is the common case >> for regular UDP connections (i.e. running netperf UDP_STREAM). In this >> case, GRO is just overhead. Removing any code from these layers >> is good (shown in the first measurement of the commit message). > > If UDP GRO is not enabled, there are no UDP packet staging in the UDP > gro engine, the bucket list is empty. > >>> Additionally the reported deltas is within noise level according to my >>> personal experience with similar tests. >>> >> >> I've tested the difference between net-next and this patch repetitively, >> which showed stable results each time. Is there any specific test you >> think would be helpful to show the result? > > Anything that show measurable gain. > > Reporting the CPU utilization in the inet_gro_receive() function alone > is not enough, as part of the load has been moved into > gro_network_flush()/tcp_gro_receive(). > Got it, the numbers I reported were only relevant to UDP flows (so measuring perf top with -g flag showed the same improvement). I'll post in v5 numbers relevant to TCP as well. Thanks