Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > Satisfied? Very much. FWIW almost the same procedure led to the weights 0, 2, 1 and 4 that you see in help.c. The weights are basically factors with which mistakes are punished: if you just confuse two adjacent letters, such as "psuh" instead of "push" (which can be quite common if you use two hands, one on the left side, and one on the right side of the keyboard, with an en-US layout so many of us use, myself included) it costs 0. If you write a different character than what you intended, the cost is 2. The idea behind it is that you're more likely to miss a key than to hit the wrong key. With the laptop I am typing this email on, it is particularly likely that I miss a key, because there are certain key combinations where only the first key triggers an input event, but the second only triggers an input event when it is _released_ after the first one. So when I type "er" real fast and happen to release the "e" key after the "r" key, no "r" appears on my screen. Okay, so the weight for adding a character must be smaller than substituting a character, but why is the cost for deletion so high? Well, I really rarely type unnecessary characters (except when writing to the Git mailing list, arguably) so those costs must be substantially higher than for typing the wrong character. These are actually very good justifications in the sense that people who might want to tweak the heuristics can see the reason behind the current choice and agree or disagree with it. I somehow suspect that a good mathematician can come up with a rationale for 6 after the fact that sounds convincing, along the lines of "the average length of commands being N, and levenshtein penalties being <0,2,1,4>, you can insert X mistaken keystroke and/or omit Y mistaken keystroke per every correct keystroke without exceeding this value 6, and the percentage X and/or Y represents is not too low to be practical but low enough to reject false positives". In any case, I'll further squash in the following. Thanks for an amusing explanation ;-). diff --git a/help.c b/help.c index fbf80d9..de1e2ea 100644 --- a/help.c +++ b/help.c @@ -297,7 +297,7 @@ static void add_cmd_list(struct cmdnames *cmds, struct cmdnames *old) old->names = NULL; } -/* how did we decide this is a good cutoff??? */ +/* An empirically derived magic number */ #define SIMILAR_ENOUGH(x) ((x) < 6) const char *help_unknown_cmd(const char *cmd) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html