On Mon, May 15, 2023 at 12:13:43AM -0700, Eric Biggers wrote: > Sure, given that this is an optimization problem with a very small scope > (decoding 6 fields from a bitstream), I was hoping for something easier and > faster to iterate on than setting up a full kernel + bcachefs test environment > and reverse engineering 500 lines of shell script. But sure, I can look into > that when I have a chance. If you were actually wanting to help, that repository is the tool I use for kernel development and testing - it's got documentation. It builds a kernel, boots a VM and runs a test in about 15 seconds, no need for lifting that code out to userspace. > > Your approach wasn't any faster than the existing C version. > > Well, it's your implementation of what you thought was "my approach". It > doesn't quite match what I had suggested. As I mentioned in my last email, it's > also unclear that your new code is ever actually executed, since you made it > conditional on all fields being byte-aligned... Eric, I'm not an idiot, that was one of the first things I checked. No unaligned bkey formats were generated in my tests. The latest iteration of your approach that I looked at compiled to ~250 bytes of code, vs. ~50 bytes for the dynamically generated unpack functions. I'm sure it's possible to shave a bit off with some more work, but looking at the generated code it's clear it's not going to be competitive.