Junio C Hamano <gitster@xxxxxxxxx> writes: > David Kastrup <dak@xxxxxxx> writes: > >> It's snake oil making debugging harder. > > OK, that is a sensible argument. > >>> This was fun ;-) >> >> At the expense of seriously impacting my motivation to do any further >> code cleanup on Git. > > Well, I said it was "fun" because I was learning from a new person > who haven't made much technical or code aethesitics discussion here > so far. If teaching others frustrates you and stops contributing to > the project, that is a loss. So let's post something noteworthy technical: the current code iteration sported the following routine: static struct blame_entry *blame_merge(struct blame_entry *list1, struct blame_entry *list2) { struct blame_entry *p1 = list1, *p2 = list2, **tail = &list1; if (!p1) return p2; if (!p2) return p1; if (p1->s_lno <= p2->s_lno) { do { tail = &p1->next; if ((p1 = *tail) == NULL) { *tail = p2; return list1; } } while (p1->s_lno <= p2->s_lno); } for (;;) { *tail = p2; do { tail = &p2->next; if ((p2 = *tail) == NULL) { *tail = p1; return list1; } } while (p1->s_lno > p2->s_lno); *tail = p1; do { tail = &p1->next; if ((p1 = *tail) == NULL) { *tail = p2; return list1; } } while (p1->s_lno <= p2->s_lno); } } This is decidedly more complex than the equivalent static struct blame_entry *blame_merge(struct blame_entry *list1, struct blame_entry *list2) { struct blame_entry *p1 = list1, *p2 = list2, **tail = &list1; while (p1 && p2) { if (p1->s_lno <= p2->s_lno) { *tail = p1; p1 = *(tail = &p1->next); } else { *tail = p2; p2 = *(tail = &p2->next); } } *tail = p1 ? p1 : p2; return list1; } Why not use the latter one? Apart from the somewhat obsessive-compulsive avoidance of checking the same condition twice in a row (which nowadays compilers are pretty capable of taking into account by themselves) the actually decisive difference is to avoid rewriting links that are already pointing to the right place. Those are the only write accesses that take place, and since we are talking about _sorting_, it is to be expected that in the long run, every record ends up in a cacheline different from its ultimate neighbors. Also, assuming a more or less random distribution and equally sized chains, about half of the links will already be correct. In practice, the situation tends to be more degenerate, leading to a _higher_ ratio of already correct links. Cutting down the expensive writeout of cachelines by a factor of more than 2 makes quite a difference in performance. Does it matter here? Unlikely. Profiling tools tend to be useless for seeing the impact of dirty cachelines since their operation significantly interferes with the read/write patterns so this kind of detail often escapes notice. But the total contribution of this stage to the runtime of git blame will be insignificant either way. > new blood bringing in new ideas is fine, but I'd want to see those new > ideas backed by solid thiniking, and that means they may have to > explain themselves to those who are new to their ideas. Well, there _is_ solid thinking behind the above code, but there _also_ is solid thinking behind the notion that it won't matter in this case. I still prefer not pushing out code in the wild that I would hesitate to employ in _all_ comparable circumstances. Consider it as a pattern or a style: a lot of thinking went into it once, and what this should buy you is to never have to think about this issue again. C++ is better at actual code reuse, but with C you basically get pattern reuse. Your head is instantiating the templates. And it's a good idea not to crowd one's head with unnecessary and/or suboptimal patterns. -- David Kastrup -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html