On Wed, Nov 24, 2010 at 8:49 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Erik Faye-Lund <kusmabite@xxxxxxxxx> writes: > >> @@ -320,9 +321,16 @@ const char *help_unknown_cmd(const char *cmd) >> uniq(&main_cmds); >> >> /* This reuses cmdname->len for similarity index */ >> + for (i = 0; i < main_cmds.cnt; ++i) { >> + main_cmds.names[i]->len = 1 + >> levenshtein(cmd, main_cmds.names[i]->name, 0, 2, 1, 4); >> + for (n = 0; n < ARRAY_SIZE(common_cmds); ++n) { >> + if (!strcmp(main_cmds.names[i]->name, >> + common_cmds[n].name) && >> + !prefixcmp(main_cmds.names[i]->name, cmd)) >> + main_cmds.names[i]->len = 0; >> + } >> + } > > So main_cmds.names[]->len (which is not "len" anymore at this point but is > just a "score") gets levenshtein distance (i.e. a smaller number indicates > cmd is more likely to be a typo of it), and in addition ->len == 0 is "it > is prefix". Overall, the smaller the score, the likelier the match. > Correct. This was already the case, though. I just reserved len = 0 as "this is a prefix of the entered command". >> @@ -330,9 +338,12 @@ const char *help_unknown_cmd(const char *cmd) >> if (!main_cmds.cnt) >> die ("Uh oh. Your system reports no Git commands at all."); >> >> - best_similarity = main_cmds.names[0]->len; >> - n = 1; >> - while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len) >> + n = 0; >> + do { >> + best_similarity = main_cmds.names[n++]->len; >> + } while (!best_similarity); > > At this point, main_cmds.names[] is sorted by the above score (smaller to > larger), and first you skip all the "prefix" ones that score 0. > > This relies on the fact that there is at least one entry with non-zero > score, which in practice is true, but without even a comment? I feel > dirty. > Ah, yes. This needs clearer code badly. I'll see what I can cook up. > The score of the first non-prefix entry is in best_similarity and that > entry is at main_cmds.names[n-1] at this point. You haven't checked > main_cmds.names[n] yet... > >> + n++; > > ... but you increment n to skip that entry without even looking, and then > go on to ... > This is a bug, thanks for spotting it. It was intended to be the same as "i = 1" in the old version, but I didn't think about it being increased in the previous loop as well, silly me. >> + while (n < main_cmds.cnt && best_similarity >= main_cmds.names[n]->len) >> ++n; > > You skip the entries with the same similarity as the closest typo, > presumably to point n to the first entry that is irrelevant (i.e. 0 thru n > but not including n are candidates). > > Your rewrite of the loop makes it very hard to read and spot bugs, I > think. > Indeed. What about this intra-diff? Hopefully it's a bit clearer, as it's closer to the original, just reusing the same logic for the new similar loop... Also makes the final diff smaller, which is nice. diff --git a/help.c b/help.c index dc76a62..d02a019 100644 --- a/help.c +++ b/help.c @@ -339,11 +339,10 @@ const char *help_unknown_cmd(const char *cmd) die ("Uh oh. Your system reports no Git commands at all."); n = 0; - do { - best_similarity = main_cmds.names[n++]->len; - } while (!best_similarity); - n++; - while (n < main_cmds.cnt && best_similarity >= main_cmds.names[n]->len) + while (n < main_cmds.cnt && !main_cmds.names[n]->len) + ++n; + best_similarity = main_cmds.names[n++]->len; + while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len) ++n; if (autocorrect && n == 1 && SIMILAR_ENOUGH(best_similarity)) { const char *assumed = main_cmds.names[0]->name; -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html