On Mon, Mar 20, 2023 at 04:02:46PM -0400, Taylor Blau wrote: > Both `read_be32()` and `read_u8()` are defined as inline dating back > to b5007211b6 (pack-bitmap: do not use gcc packed attribute, > 2014-11-27), though that commit does not hint at why the functions were > defined with that attribute. I think any non-header inline like this can implicitly be assumed to be "I thought it would make things faster". ;) > However (at least with GCC 12.2.0, at the time of writing), the > resulting pack-bitmap.o contains the same instructions with or without > the inline attribute applied to these functions: > > $ make O=3 pack-bitmap.o && mv pack-bitmap.o{,.before} > [ apply this patch ] > $ make O=3 pack-bitmap.o && mv pack-bitmap.o{,.after} > $ objdump -d pack-bitmap.o.before >before > $ objdump -d pack-bitmap.o.after >after > $ diff -u pack-bitmap.o.{before,after} > --- before 2023-03-15 18:54:17.021580095 -0400 > +++ after 2023-03-15 18:54:21.853552218 -0400 > @@ -1,5 +1,5 @@ > > -pack-bitmap.o.before: file format elf64-x86-64 > +pack-bitmap.o.after: file format elf64-x86-64 > > Disassembly of section .text: > > So defining these functions as inline is at best a noop, and at worst > confuses the reader into thinking that there is some trickier reason > that they are defined as inline when there isn't. Nice digging. The "inline" is really just a hint to the compiler here, and obviously it does not need that hint. I do wonder if that is still true after you make them more complicated in a later patch in the series. On the other hand, I doubt that these need to be very optimized at all. If there were a tight loop of single-byte reads, the function overhead might be noticeable. But generally we're reading only a few items from the beginning of each entry, and then reading the bulk of the data via ewah_read_mmap(). So I think the overall argument is "let the compiler decide what is good to inline and what is not". -Peff