On Tuesday 26 April 2011, Junio C Hamano wrote: > Johan Herland <johan@xxxxxxxxxxx> writes: > > [...] > It needs to document and also mention in the proposed commit log message > how binary files are accounted for. Will do. > [...] > > + if (damage && file->is_binary) > > + /* > > + * binary files counts bytes, not lines. Must find some > > + * way to normalize binary bytes vs. textual lines. > > + * The following heuristic is cheap, but beyond ugly... > > + */ > > + damage = damage < 52 ? 1 : damage / 52; > > If 52 is just as good as any number around 50-70 range, I would prefer to > see 64, just because I am superstitious and dividing by a power of two > feels nicer. Will do. > > +cat <<EOF >expect_diff_dirstat_CC > > + 16.7% changed/ > > + 16.7% dst/copy/changed/ > > + 16.7% dst/copy/rearranged/ > > + 16.7% dst/move/changed/ > > + 16.7% dst/move/rearranged/ > > + 16.7% rearranged/ > > +EOF > > I really wish you can come up with a way to express expected results in > much less strict way in the test vector (not limited to the test vectors > for this patch but for the entire series). The underlying count-damages > (for the purpose of rename detection) implementation may improve over > time and the textual diff generation may too. Here what we want to > preserve is that these six entries show more-or-less the same amount of > contribution, not precisely 16.666666% each. Yeah, that does make sense, although I haven't yet thought of a good way to do this without losing important details. I thought of assigning letters to each percentage value, so that instead of cat <<EOF >expect_diff_dirstat_M 5.3% changed/ 26.3% dst/copy/changed/ 26.3% dst/copy/rearranged/ 26.3% dst/copy/unchanged/ 5.3% dst/move/changed/ 5.3% dst/move/rearranged/ 5.3% rearranged/ EOF cat <<EOF >expect_diff_dirstat_CC 16.7% changed/ 16.7% dst/copy/changed/ 16.7% dst/copy/rearranged/ 16.7% dst/move/changed/ 16.7% dst/move/rearranged/ 16.7% rearranged/ EOF you'd have cat <<EOF >expect_diff_dirstat_M A% changed/ B% dst/copy/changed/ B% dst/copy/rearranged/ B% dst/copy/unchanged/ A% dst/move/changed/ A% dst/move/rearranged/ A% rearranged/ EOF cat <<EOF >expect_diff_dirstat_CC A% changed/ A% dst/copy/changed/ A% dst/copy/rearranged/ A% dst/move/changed/ A% dst/move/rearranged/ A% rearranged/ EOF but that would lose too much detail in many cases... Also, I partly like the fact that we test the output strictly. That way, when someone _does_ change the underlying details, they get an immediate test failure telling them exactly _how_ the numbers changed. They can then use that to verify that their changes produce the expected results, before finally updating the test in accordance with their new numbers. ...Johan -- Johan Herland, <johan@xxxxxxxxxxx> www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html