> On 2/8/24 10:04 AM, Yonghong Song wrote: >> >> On 2/8/24 8:51 AM, Jose E. Marchesi wrote: >>>> On Thu, 2024-02-08 at 16:35 +0100, Jose E. Marchesi wrote: >>>> [...] >>>> >>>>> If the compiler generates assembly code the same code for >>>>> profile2.c for >>>>> before and after, that means that the loop does _not_ get >>>>> unrolled when >>>>> profiler.inc.h is built with -O2 but without #pragma unroll. >>>>> >>>>> But what if #pragma unroll is used? If it unrolls then, that >>>>> would mean >>>>> that the pragma does something more than -funroll-loops/-O2. >>>>> >>>>> Sorry if I am not making sense. Stuff like this confuses me to no end >>>>> ;) >>>> Sorry, I messed up while switching branches :( >>>> Here are the correct stats: >>>> >>>> | File | insn # | insn # | >>>> | | before | after | >>>> |-----------------+--------+--------| >>>> | profiler1.bpf.o | 16716 | 4813 | >>> This means: >>> >>> - With both `#pragma unroll' and -O2 we get 16716 instructions. >>> - Without `#pragma unroll' and with -O2 we get 4813 instructions. >>> >>> Weird. >> >> Thanks for the analysis. I can reproduce with vs. without '#pragma >> unroll' at -O2 >> level, the number of generated insns is indeed different, quite >> dramatically >> as the above numbers. I will do some checking in compiler. > > Okay, a quick checking compiler found that > - with "#pragma unroll" means no profitability test and do full > unroll as instructed I don't think clang's `#pragma unroll' does full unroll. On one side, AFAIK `pragma unroll' is supposed to be equivalent to `pragma clang loop(enable)', which is different to `pragma clang loop unroll(full)'. On the other, if you replace `pragma unroll' with `pragma clang loop unroll(full)' in the BPF selftests you will get branch instruction overflows. What criteria `pragma unroll' in clang uses in order to determine how much it unrolls the loop, compared to -O2|-funroll-loops, I don't know. > - without "#pragma unroll" mean compiler will do profitability for full unroll, > if compiler thinks full unroll is not profitable, there will be no unrolling. > > So for gcc, even users saying '#pragma unroll', gcc still do > profitability test? GCC doesn't support `#pragma unroll'. Hence in my original patch the macro __pragma_unroll expands to nothing with GCC. That will lead to the compiler perhaps not unrolling the loop even with -O2|-funroll-loops. > >> >>> >>>> | profiler2.bpf.o | 2088 | 2050 | >>> - Without `#pragma unroll' and with -O2 we get 2088 instructions. >>> - With `#pragma loop unroll(disable)' and with -O2 we get 2050 >>> instructions. >>> >>> Also surprising. >>> >>>> | profiler3.bpf.o | 4465 | 1690 | >>