Re: [PATCH nf-next] nft_set_pipapo_avx2: Skip LDMXCSR, we don't need a valid MXCSR state

Andy Lutomirski <luto@xxxxxxxxxx> · Tue, 18 May 2021 09:48:41 -0700

On 5/18/21 9:01 AM, Pablo Neira Ayuso wrote:
> On Mon, May 10, 2021 at 07:58:52AM +0200, Stefano Brivio wrote:
>> We don't need a valid MXCSR state for the lookup routines, none of
>> the instructions we use rely on or affect any bit in the MXCSR
>> register.
>>
>> Instead of calling kernel_fpu_begin(), we can pass 0 as mask to
>> kernel_fpu_begin_mask() and spare one LDMXCSR instruction.
>>
>> Commit 49200d17d27d ("x86/fpu/64: Don't FNINIT in kernel_fpu_begin()")
>> already speeds up lookups considerably, and by dropping the MCXSR
>> initialisation we can now get a much smaller, but measurable, increase
>> in matching rates.
>>
>> The table below reports matching rates and a wild approximation of
>> clock cycles needed for a match in a "port,net" test with 10 entries
>> from selftests/netfilter/nft_concat_range.sh, limited to the first
>> field, i.e. the port (with nft_set_rbtree initialisation skipped), run
>> on a single AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB 

Please consider reverting this patch.  You have papered over the actual
problem, which is that the kernel does not get the AVX pipeline stalls
right.  LDMXCSR merely exacerbates the problem, but your patch won't
really fix it.

A real fix is on my radar.  If you end up applying this patch, I'll
probably revert it later.