> On Feb 4, 2019, at 2:53 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Sat, Feb 02, 2019 at 05:49:35PM -0500, Chuck Lever wrote: >>>> Byte-swapping causes a CPU pipeline bubble on some processors. When >>>> a decoder is comparing an on-the-wire value for equality, byte- >>>> swapping can be avoided by comparing it directly to a pre-byte- >>>> swapped constant value. >>> >>> Which ones? >> >> I assume you mean on which processors have I observed CPU cycle >> spikes around bswap instructions. > > Yes. > >> I've seen this behavior only >> on Intel processors of various families. > > Interesting. In general we should not do separate byte swap instructions > on x86, as MOVBE can be used to do a load or store with an included > byteswap, and I thought the whole point for that was that they could > be handled in the same cycle. > > In fact https://www.agner.org/optimize/instruction_tables.pdf > says that movbe is generally a single cycle instruction. > >> Would you prefer a different justification for this clean-up? > > I don't really care about the cleanup, it is just that the explanation > goes against conventional wisdom, which is why I was a little surpised. > > And that is not just the cycles, but also as Trond pointed out > that the Linux byte swapping macro on constants should usually be > optimized away at compile time anyway. They are. The problem is that we are byte-swapping the incoming wire data and then comparing to CPU-endian constants in some hot paths. It's better to leave the incoming data alone and compare to a pre- byte-swapped constant. This patch adds some of these constants that were missing, in preparation for fixing the hot paths. That is apparently not clear from the patch description, so I will endeavor to improve it. -- Chuck Lever