On Thu, Feb 6, 2020 at 12:06 PM Luc Van Oostenryck <luc.vanoostenryck@xxxxxxxxx> wrote: > > However, I thought that the 5+seconds of runtime with 2.9Gb of memory > consumption I reported earlier was somehow excessive. So, I looked > at the preprocessed file and my editor (and several other tools) chocked > near the end ... It appears that one line is 2.8Mb on a total of 6.2MB > and contains 28968 times the expression for num_possible_cpus(). Whee.. > The origin of this is situted at line 647: > smap->bucket_log = max_t(u32, 1, ilog2(roundup_pow_of_two(...)); > because ilog2() is an 'interesting' macro which is already expanded > inside roundup_pow_of_two(). Yeah, so we have "const_ilog2()" expanding its argument 63 times (to handle the "just turn it into a constant" case), and so ilog2(roundup_pow_of_two(x)) where both ilog2() and roundup_pow_of_two() contains a const_ilog() ends up internally essentially expanding x 63*63 times. Plus a couple for the non-constant case. And in this case 'x' wasn't all that simple to begin with on SMP. And then the "max_t" thing adds another factor of 7 due to the whole "let's keep a constant expression constant" with all the careful "can I use a variable or not" code. So you get 7*63*63 expansions of num_possible_cpus(), plus that "some slop" for the other cases. I wonder if we could just make sparse _warn_ about this kind of situation with expressions that are very big - even if they turn into nothing)? Because I bet it's not good for a real compiler either. Compile time does matter to people, and this clearly wasn't intentional. And even if we apply a patch to avoid it here, that's fine, but others might be lurking. Of course, sometimes you do want to have that kind of nested expansion on purpose - creating huge expression lists for some initializer or whatever. And sometimes the constant value is what you care about, and are willing to have complex expressions. I don't think anybody intended for the expression to be quite _that_ complex, though.. > This exists since the introduction of this file in commit > 6ac99e8f23d4 bpf: Introduce bpf sk local storage > but back then it made sparse consume only about 500Mb of memory on it. Well, the fact that sparse memory use has exploded by a factor of 6 is not exactly good either. What happened? > but a better patch should, I think, directly use ilog2() and avoid the roundup. No, I think it would be better to just split that expression up. Right now the code does: /* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */ smap->bucket_log = max_t(u32, 1, ilog2(roundup_pow_of_two(num_possible_cpus()))); nbuckets = 1U << smap->bucket_log; so it calculates the log2, and then does "1U << log2". Instead, it could just calculate the nbuckets first, and then do the "log2()" on that: /* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */ nbuckets = max_t(u32, 2, roundup_pow_of_two(num_possible_cpus())); smap->bucket_log = ilog2(buckets); because honestly, that is just a whole lot more legible anyway. Maybe even split _that_ up, and have the max_t as a separate thing. Right now the constant in the comment (2) doesn't match the constant in the code (1) because the code is too dense for its own good. Of course, currently that "too dense for its own good" code ends up evaluating to a constant on UP. Which the easier-to-read code does not. I'm not convinced that it makes sense to optimize for UP that much. Linus