On Mon, 25 Nov 2019 11:02:14 +0100 Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > BTW, do you have numbers comparing the AVX2 version with the C code? I > quickly had a look at your numbers, but not clear to me if this is > compared there. No, sorry, I didn't report that anywhere, I probably should have in the commit messages for 4/8 and 5/8. This was from v1 at 4/8, single thread on AMD Epyc 7351, C implementation without unrolled loops: TEST: performance net,port [ OK ] baseline (drop from netdev hook): 9971887pps baseline hash (non-ranged entries): 5991032pps baseline rbtree (match on first field only): 2666255pps set with 1000 full, ranged entries: 2220404pps port,net [ OK ] baseline (drop from netdev hook): 10004499pps baseline hash (non-ranged entries): 6011221pps baseline rbtree (match on first field only): 4035566pps set with 100 full, ranged entries: 4018240pps net6,port [ OK ] baseline (drop from netdev hook): 9497500pps baseline hash (non-ranged entries): 4685436pps baseline rbtree (match on first field only): 1354978pps set with 1000 full, ranged entries: 1052188pps port,proto [ OK ] baseline (drop from netdev hook): 10749256pps baseline hash (non-ranged entries): 6774103pps baseline rbtree (match on first field only): 2819211pps set with 30000 full, ranged entries: 283492pps net6,port,mac [ OK ] baseline (drop from netdev hook): 9463935pps baseline hash (non-ranged entries): 3777039pps baseline rbtree (match on first field only): 2943527pps set with 10 full, ranged entries: 1927899pps net6,port,mac,proto [ OK ] baseline (drop from netdev hook): 9502200pps baseline hash (non-ranged entries): 3637739pps baseline rbtree (match on first field only): 1342323pps set with 1000 full, ranged entries: 753960pps net,mac [ OK ] baseline (drop from netdev hook): 10065715pps baseline hash (non-ranged entries): 5082895pps baseline rbtree (match on first field only): 2677391pps set with 1000 full, ranged entries: 1215104pps I would re-run tests on v3 patches and include the comparisons in commit messages. By the way, as you can see, even though the comparison with rbtree is unfair (comparing > 1 fields adds substantial complexity), without AVX2 it doesn't scale as nicely. I plan to propose some optimisations that should substantially improve the non-vectorised case, but what I have in mind right now is a bit convoluted and I would skip it in this initial submission. -- Stefano