On Fri, Nov 06, 2020 at 11:44:00AM -0800, Linus Torvalds wrote: > On Fri, Nov 6, 2020 at 11:05 AM Song Liu <songliubraving@xxxxxx> wrote: > (a) the conditional following for the return value: > > select.32 %r4 <- %arg1, $0xffffffff, $0 > select.32 %r6 <- %r4, $0xffffffff, $0 > > Note how it doesn't trigger CSE, because it's not the same instruction > (%arg1 vs %r4), but sparse *could* see that a select that depends on a > previous select that has constant arguments can be simplified to be > conditional on the original select value instead, so it *could* be > CSE'd into one single thing. > > So the second "select" could be optimized away by realizing that it > really gets the same value as the first one. That *should* be trivial > to do in sparse by just having a simple pattern for select > simplification. > > And once that optimization is in place, the second optimization would be I think I may even have already this in an half-polished form. > (b) trivial branch following: "conditional branch to conditional > branch with the same condition". > > ie we'd have the pattern of > > cbr %arg1, .L5, .L4 > ... > .L5: > cbr %arg1, .L6, .L2 > > and now a trivial branch following optimization would see "oh, the > first target of the first cbr is to another cbr that has the same > condition, so we can rewrite the first cbr as > > cbr %arg1, .L6, .L4 > > instead. This, should already be done (by yourself, many years ago). > And once you've done that trivial branch following, the .L5 target has > only that fallthrough source from .L4, and the code becomes > > cmp: > .L0: > <entry-point> > select.32 %r4 <- %arg1, $0xffffffff, $0 > cbr %arg1, .L6, .L4 > > .L4: > call spin_lock > cbr %arg1, .L6, .L2 > > .L2: > call spin_unlock > br .L6 > > .L6: > ret.32 %r4 > > which still isn't good enough, because it still has that conditional > branch that is now pointless. > > Removing the second conditional branch entirely would require having a > "look, it's now dominated by a previous conditional branch on the same > condition, so it can't trigger". > > That's the only remaining optimization that isn't purely local. Yes. > Dominance analysis is always painful. sparse does do it (for memory > accesses), but it's a *lot* more complex than just some local > single-instruction pattern that looks at the source or the target of > that single instruction. This shouldn't be a big problem since we already have the dominance tree (but it's not yet updated when there is changes n the CFG). OTOH, the memops simplification creates invalid phi-nodes which indirectly creates all sort of problems. I've several fixes for this but none I'm really happy with. > Maybe this gives Luc some ideas, though. > > Or maybe Luc will point out that even my "simple" optimizations are > slightly more painful than I think they are for some reason that I > haven't looked at. No no, it's certainly more than a few hours of work but it's also not an immense task. Of course, the devil is always in the details. There is also the option of explicitly handling conditional contexts. I did maybe 6 months ago, in 2 or 3 different ways. It worked well, and would probably solve the problem here but: * it needs new annotations * it's OK when the condition is simple ( = 0 or != 0), it could be adapted for some others (like, for example, >= 0) but more general condition would become hard to handle (the annotations needs to know about the condition). * the benefits were disappointing because, in most cases I saw, it displaced the problem to somewhere else where a 'real' optimization was needed. This is why I stopped to look at it. But so, yes, what you propose here makes a lot of sense and would be very valuable. I'll take a look at it soon (but my stack is already so full that I'm quite reluctant to begin anything new (I currently have 208 topic branches with 120 of them being valid, interesting seriess waiting some polishing, validation and patch description :( ). -- Luc