On Mon, May 15, 2023 at 02:18:14AM -0400, Kent Overstreet wrote: > On Sun, May 14, 2023 at 11:13:46PM -0700, Eric Biggers wrote: > > On Mon, May 15, 2023 at 01:38:51AM -0400, Kent Overstreet wrote: > > > On Sun, May 14, 2023 at 11:43:25AM -0700, Eric Biggers wrote: > > > > I think it would also help if the generated assembly had the handling of the > > > > fields interleaved. To achieve that, it might be necessary to interleave the C > > > > code. > > > > > > No, that has negligable effect on performance - as expected, for an out > > > of order processor. < 1% improvement. > > > > > > It doesn't look like this approach is going to work here. Sadly. > > > > I'd be glad to take a look at the code you actually tried. It would be helpful > > if you actually provided it, instead of just this "I tried it, I'm giving up > > now" sort of thing. > > https://evilpiepirate.org/git/bcachefs.git/log/?h=bkey_unpack > > > I was also hoping you'd take the time to split this out into a userspace > > micro-benchmark program that we could quickly try different approaches on. > > I don't need to, because I already have this: > https://evilpiepirate.org/git/ktest.git/tree/tests/bcachefs/perf.ktest Sure, given that this is an optimization problem with a very small scope (decoding 6 fields from a bitstream), I was hoping for something easier and faster to iterate on than setting up a full kernel + bcachefs test environment and reverse engineering 500 lines of shell script. But sure, I can look into that when I have a chance. > Your approach wasn't any faster than the existing C version. Well, it's your implementation of what you thought was "my approach". It doesn't quite match what I had suggested. As I mentioned in my last email, it's also unclear that your new code is ever actually executed, since you made it conditional on all fields being byte-aligned... - Eric