On Fri, May 12, 2023 at 06:57:52PM -0700, Eric Biggers wrote: > What I would suggest instead is preprocessing the list of 6 field lengths to > create some information that can be used to extract all 6 fields branchlessly > with no dependencies between different fields. (And you clearly *can* add a > preprocessing step, as you already have one -- the dynamic code generator.) > > So, something like the following: > > const struct field_info *info = &format->fields[0]; > > field0 = (in->u64s[info->word_idx] >> info->shift1) & info->mask; > field0 |= in->u64s[info->word_idx - 1] >> info->shift2; > > ... but with the code for all 6 fields interleaved. > > On modern CPUs, I think that would be faster than your current C code. > > You could do better by creating variants that are specialized for specific > common sets of parameters. During "preprocessing", you would select a variant > and set an enum accordingly. During decoding, you would switch on that enum and > call the appropriate variant. (This could also be done with a function pointer, > of course, but indirect calls are slow these days...) > > For example, you mentioned that 8-byte packed keys is a common case. In that > case there is only a single u64 to decode from, so you could create a function > that just handles that case: > > field0 = (word >> info->shift) & info->mask; > > You could also create other variants, e.g.: > > - 16-byte packed keys (which you mentioned are common) > - Some specific set of fields have zero width so don't need to be extracted > (which it sounds like is common, or is it different fields each time?) > - All fields having specific lengths (are there any particularly common cases?) > > Have you considered any of these ideas? I like that idea. Gonna hack some code... :)