On Wed, Sep 29, 2021 at 5:19 PM Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote: > ... > arch/x86/mm/amdtopology.c:110:7: remark: 'test_bit' not inlined into > 'amd_numa_init' because too costly to inline (cost=115, threshold=45) > [-Rpass-missed=inline] > if (node_isset(nodeid, numa_nodes_parsed)) { Yeah, I think that we should just do the __always_inline thing. I'd rather have the stupid debug code overhead in the caller - that may end up knowing that the pointer actually is so that the debug code goes away - than have "test_bit()" uninlined because there's so much crazy debug code in it. I also happen to believe that we have too much crazy "instrumentation" crap. Why is that test_bit() word read so magical that it merits a "instrument_atomic_read()"? But I absolutely detest how KCSAN and some other tooling seems to get a free pass on doing stupid things, just because they generated bad warnings so then they can freely generate these much more fundamental problems because the result is a f*cking mess. > Though for the defconfig case...somehow the cost is more than with the > sanitizers... Maybe the solution is that if you have some of the crazy sanitizers, we just say "the end result is not worth even checking". And stop checking all the section mismatches, and all the stack size things. Because it looks like this is more of a real issue: > arch/x86/mm/amdtopology.c:157:7: remark: '__nodes_weight' not inlined > into 'amd_numa_init' because too costly to inline (cost=930, > threshold=45) [-Rpass-missed=inline] > if (!nodes_weight(numa_nodes_parsed)) > ^ Hmm. That's just a "bitmap_weight()", and that function in turn is __always_inline. And the *reason* it is __always_inline is that it really wants to act as a macro, and look at the second argument and do special things if it is a small constant value. And it looks like clang messes things up by simply not doing enough simplification before inlining decisions, so it all looks very complicated to clang, even though when you actually generate code, you have one (of two) very simple code sequences. > > Wouldn't it be better to make > > them always-inline? > > Perhaps, see what that might look like: > https://github.com/ClangBuiltLinux/linux/issues/1302#issuecomment-807260475 > Does that look better? I suspect that in this case, because of clang deficiencies, that __always_inline actually is the right thing to do at least on __nodes_weight. Looking at your comment lower down https://github.com/ClangBuiltLinux/linux/issues/1302#issuecomment-807757878 I really think this is a clang bug, and that you need to do certain simplifications both before _and_ after inlining. Before, because of the inlining cost decisions particularly wrt constant arguments. After, because successful inlining changes things completely. Marking __nodes_weight() be __always_inline just works around clang being broken in this regard. It is _possible_ that it might help to make bitmap_weight() be a macro instead of an inline function, but it's a kind of sad state of affairs if that is required. And it might well fail - if you don't do the constant propagation before making inlining decisions, you'll _still_ end up thinking that bitmap_weight() is very costly because you don't do that __builtin_constant_p() lowering. And then you end up using the (much more expensive) generic function instead of the cheap "look, for single words this is a trivial" thing. > Part of me feels like modpost not warning on those is permitting a > "memory leak," in so far as code that's only called from .init callers > is never reclaimed. Or leaving behind gadgets... I think we can just treat modpost as a "good heuristic". If it catches all the normal cases, it's fine - but it must not have false positives. That's basically true of all warnings. False positive warnings make a warning worthless. That's just *basic*. So the gcc thing is a "ok, we know compilers mess this up if they do partial inlining with constant propagation, so we will suppress what is quite likely a false positive for that case". That clang patch, in comparison? That's just a hack enumerating random cases. TRhere is no logic to it, and there is absolutely zero maintainability. It will cause us to forever just add other random cases to the list, making the whole tooling entirely pointless. See the difference? Maybe clang should just adopt the gcc naming convention, so that we can just use the gcc heuristic. > > clear case of "this inlining failed". This ad-hoc list has cases of > > things that are clearly wrong in general ("test_bit()" must not use > > initdata), and that "ok, the function just doesn't have the right > > section marker. > > Sorry, what do you mean "test_bit() must not use initdata?" Because it > can lead to problems like this? Or...? No, I mean that it is completely unacceptable to add some crazy rule like "you can access this init-data from any context, as long as you use test_bit to do so". That's basically what your rule does. And it's a FUNDAMENTALLY invalid rule. It's simply not true. The rule is invalid, it's just that clang has made such a mess of it that in one particular case it happens to be true. The gcc "rule" is much more reasonable: it's *not* saying "it's ok to access this init-data from test_bit". The gcc rule says "we know gcc messes up our heuristics when out-of-lining with constprop, so we just won't warn because false positives are bad, bad, bad. One rule is fundamentally garbage and wrong. The other rule is a generic "we know this situation cannot be tested for". Very different. Linus