>> On 2/8/24 10:04 AM, Yonghong Song wrote: >>> >>> On 2/8/24 8:51 AM, Jose E. Marchesi wrote: >>>>> On Thu, 2024-02-08 at 16:35 +0100, Jose E. Marchesi wrote: >>>>> [...] >>>>> >>>>>> If the compiler generates assembly code the same code for >>>>>> profile2.c for >>>>>> before and after, that means that the loop does _not_ get >>>>>> unrolled when >>>>>> profiler.inc.h is built with -O2 but without #pragma unroll. >>>>>> >>>>>> But what if #pragma unroll is used? If it unrolls then, that >>>>>> would mean >>>>>> that the pragma does something more than -funroll-loops/-O2. >>>>>> >>>>>> Sorry if I am not making sense. Stuff like this confuses me to no end >>>>>> ;) >>>>> Sorry, I messed up while switching branches :( >>>>> Here are the correct stats: >>>>> >>>>> | File | insn # | insn # | >>>>> | | before | after | >>>>> |-----------------+--------+--------| >>>>> | profiler1.bpf.o | 16716 | 4813 | >>>> This means: >>>> >>>> - With both `#pragma unroll' and -O2 we get 16716 instructions. >>>> - Without `#pragma unroll' and with -O2 we get 4813 instructions. >>>> >>>> Weird. >>> >>> Thanks for the analysis. I can reproduce with vs. without '#pragma >>> unroll' at -O2 >>> level, the number of generated insns is indeed different, quite >>> dramatically >>> as the above numbers. I will do some checking in compiler. >> >> Okay, a quick checking compiler found that >> - with "#pragma unroll" means no profitability test and do full >> unroll as instructed > > > I don't think clang's `#pragma unroll' does full unroll. > > On one side, AFAIK `pragma unroll' is supposed to be equivalent to > `pragma clang loop(enable)', which is different to `pragma clang loop > unroll(full)'. > > On the other, if you replace `pragma unroll' with `pragma clang loop > unroll(full)' in the BPF selftests you will get branch instruction > overflows. > > What criteria `pragma unroll' in clang uses in order to determine how > much it unrolls the loop, compared to -O2|-funroll-loops, I don't know. This makes me wonder, asking from ignorance: what is the benefit/point for BPF programs to partially unroll a loop? I would have said either we unroll them completely in order to avoid verification problems, or we don't unroll them because the verifier is supposed to handle it the way it is written... >> - without "#pragma unroll" mean compiler will do profitability for full unroll, >> if compiler thinks full unroll is not profitable, there will be no unrolling. >> >> So for gcc, even users saying '#pragma unroll', gcc still do >> profitability test? > > GCC doesn't support `#pragma unroll'. > > Hence in my original patch the macro __pragma_unroll expands to nothing > with GCC. That will lead to the compiler perhaps not unrolling the loop > even with -O2|-funroll-loops. > >> >>> >>>> >>>>> | profiler2.bpf.o | 2088 | 2050 | >>>> - Without `#pragma unroll' and with -O2 we get 2088 instructions. >>>> - With `#pragma loop unroll(disable)' and with -O2 we get 2050 >>>> instructions. >>>> >>>> Also surprising. >>>> >>>>> | profiler3.bpf.o | 4465 | 1690 | >>>