On Tue, May 09, 2017 at 03:38:43AM -0700, Christopher Li wrote: > On Mon, May 8, 2017 at 11:57 AM, Luc Van Oostenryck > <luc.vanoostenryck@xxxxxxxxx> wrote: > > Fix this by calling kill_unreachable_bbs() after having > > simplified the switch into a branch. This will avoid to > > create a cycle with because of the removed phisrc in the > > header and as an added benefit will avoid to waste time > > trying to simplify BBs that are unreachable. > > > > In addition, it's now useless to call kill_bb() for each > > removed switch's children as kill_unreachable_bbs() will > > do that too. > > I thin the fix is correct. Have some very minor comment. > > > diff --git a/linearize.c b/linearize.c > > index a9f36b823..ee9591897 100644 > > --- a/linearize.c > > +++ b/linearize.c > > @@ -642,8 +642,6 @@ static void set_activeblock(struct entrypoint *ep, struct basic_block *bb) > > static void remove_parent(struct basic_block *child, struct basic_block *parent) > > { > > remove_bb_from_list(&child->parents, parent, 1); > > - if (!child->parents) > > - kill_bb(child); > > This makes every caller of remove_parent() need to clean up the > unreachable basic blocks. Currently there is only one caller "insert_branch()" > which is fine. But if developer write a new code call this function, he might > forget to clean up the unreachable basic blocks. Kind of like a trap. Yes, true. I didn't liked it very much myself and hesitated to remove the helper and directly calling remove_bb_from_list() from insert_branch(). I'll do this now. but note that in 'normal' situation, it's doesn't really matter if the BB is killed now or not because: 1) it's only the direct descendant that is killed, the indirect ones have to wait anyway. 2) kill_unreachable_bbs() is called after each cleanup loops > > > > /* Change a "switch" into a branch */ > > @@ -670,6 +668,7 @@ void insert_branch(struct basic_block *bb, struct instruction *jmp, struct basic > > remove_parent(child, bb); > > } END_FOR_EACH_PTR(child); > > PACK_PTR_LIST(&bb->children); > > + kill_unreachable_bbs(bb->ep); > > It is correct to do so. The kill unreachable is relative expensive. > Need to go through > all the basic block in a function. I don't really agree with this. When doing the patch I also at first thought the same but then I changed my mind because: 1) it *only* need to run through all the BB (in fact the cost is O(nbr of BB edges) while there is a lot of places where we loop through all *instructions* 2) the fact that the unneeded BB are killed as soon as possible means that the normal CSE+simplification won't wastly run anymore on the indirect descendant blocks (I can redo the measurements but I think I saw a measurable speedup because of this). > If there function has more than one > switch statement > get simplified, then each simplification will go through each basic block again. > Preferably kill_unreachable() only run once at the finial stage. > > I don't know how feasible to have remove_parent() mark the "ep" needs to clean. > Then at some later point kill off the dead basic block all together. Don't forget that the only reason for this patch is a *correctness* issue when the removed BB is followed by a loop (if it has no other entry point (which is generaly the case) and which thus become unreachable). We can mark the ep (the easiest is the reuse repeat_phase and add REPEAT_FLOW (which I'll will need to do anyway for something else, I'm polishing the patch)) but it won't change anything as we *must not* allow simplifications to be done once there is such a dead cycle and the only way to detect such cycles is via something like the marker algorithm used by kill_unreachable_bbs(). > Currently simplify a switch statement should be relative rare. Like a lot of things, it depends. For example, the kernel use a lot inline functions or macro with code like: ... switch (sizeof(x)) { case 1: ... case 2: ... case 4: ... case 8: ... default: issue a build error } Also, the simplification concerned here is 'insert_branch()' which, like I explained in the commit message, is used by the switch simplification but is also used by other branch simplifications. > Have two switch > statement in same function get simplify at the same time should be > even rarer. > Might not worth the effort to optimize for that. > > As it is, the patch is acceptable. I really also would prefer that we would not have to walk the BBs but I really think we don't have good others choices. If we would really really want to avoid to call kill_unreachable_bss() we could in simplify_one_memop(), once we detect a possible loop in the addresses calculation (which is the core of the problem here: avoid to detect such a false loop), instead of issuing a message is to first call kill_unreachable_bbs() and then redo the memop simplification if some BBs were killed and the concerned instruction still exist). But: - it's more complicated than needed - the advanatge would only be based on the principle that kill_unreachable_bbs() is costly (which is not really) - calling kill_unreachable_bbs() early has the advantage to avoid doing memop and instruction simplification and CSE on code that is in fact unreacheble (and those are more costly because they are done on every instructions) I don't have the number anymore, but when testing what I saw (and I was a bit surprised at first) was that with the patch, compiling the kernel with sparse and calling test-linearize on GCC's testsuite and such was in fact slightly faster (but admitingly, the difference was small and it could have been within the accuracy of the measurementi. I can redo the measurement if needed). Also, when talking about performance, if you do some profiling you very quickly realize: - CSE is quite costly (it's not new, cfr. http://marc.info/?l=linux-sparse&m=111616763219436&w=4 and commit b5a8032aaeaf2121a642ead32653d4288fa2983d) I have looked at this more than once but I don't see what we can do but: - avoid to call CSE on dead code - call CSE less often, possibly to not trying to reach the fix point, of course this impact the quality of the code - memory allocation eats a lot of memory cycles too (about 16-20%, not especially in userland but also by things like kernel's clear_page()) But, see how you can easily win a speed up of 5%: http://marc.info/?l=linux-sparse&m=149198627609282&w=4 There is a lot that can be done here. I have a few things in preparation but it won't be for soon. - __add_ptr_list() is also surprinsingly high but I think that it's simply because it's really called a lot. -- Luc -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html