On Fri, Aug 21, 2020 at 12:57 PM Arvind Sankar <nivedita@xxxxxxxxxxxx> wrote: > > Look, four stores into memset(), yeah that's a bit weird. I didn't think > you meant "four" literally. But in any case, that has nothing to do with > the topic at hand. It would be just as bad if it was a 16-byte structure > being initialized with an out-of-line memset() call. Actually, I mis-remembered. It wasn't four stores. It was two. We have this lovely "sas_ss_reset()" function that initializes three fields in a structure (two to zero, one to '2'). And we used it in a critical place that didn't allow function calls (because we have magic rules with the SMAP instructions). And clang turned the initalization into a memset(). Which then triggered our "you can't do that here" check on the generated code. This is the kind of special rules we sometimes can have for code generation, where the compiler really doesn't understand that no, you can't just replace this code sequence with a function call, because there are things going on around it that really mean that the code should be generated the way we wrote it. > But coming back to the actual topic: it is fine if the compiler turns > four stores into __builtin_memset(). A size-16 or -32 __builtin_memset() > will get inlined anyway. There's a lot of garbage here if you look > closely: check out what gcc does to initialize a 7-character array with > zeros at -Os. Yeah. The reason we had to make -Os be a non-preferred thing is because gcc has some really sad things it does when optimizing for size. I happen to believe that I$ matters, but -Os made _such_ a mess of things that it was untenable to use ;( Linus