Re: [PATCH] help: always suggest common-cmds if prefix of cmd

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 24 Nov 2010 11:49:58 -0800

Erik Faye-Lund <kusmabite@xxxxxxxxx> writes:

> @@ -320,9 +321,16 @@ const char *help_unknown_cmd(const char *cmd)
>  	uniq(&main_cmds);
>  
>  	/* This reuses cmdname->len for similarity index */
> +	for (i = 0; i < main_cmds.cnt; ++i) {
> +		main_cmds.names[i]->len = 1 +
>  			levenshtein(cmd, main_cmds.names[i]->name, 0, 2, 1, 4);
> +		for (n = 0; n < ARRAY_SIZE(common_cmds); ++n) {
> +			if (!strcmp(main_cmds.names[i]->name,
> +			    common_cmds[n].name) &&
> +			    !prefixcmp(main_cmds.names[i]->name, cmd))
> +				main_cmds.names[i]->len = 0;
> +		}
> +	}

So main_cmds.names[]->len (which is not "len" anymore at this point but is
just a "score") gets levenshtein distance (i.e. a smaller number indicates
cmd is more likely to be a typo of it), and in addition ->len == 0 is "it
is prefix".  Overall, the smaller the score, the likelier the match.

> @@ -330,9 +338,12 @@ const char *help_unknown_cmd(const char *cmd)
>  	if (!main_cmds.cnt)
>  		die ("Uh oh. Your system reports no Git commands at all.");
>  
> -	best_similarity = main_cmds.names[0]->len;
> -	n = 1;
> -	while (n < main_cmds.cnt && best_similarity == main_cmds.names[n]->len)
> +	n = 0;
> +	do {
> +		best_similarity = main_cmds.names[n++]->len;
> +	} while (!best_similarity);

At this point, main_cmds.names[] is sorted by the above score (smaller to
larger), and first you skip all the "prefix" ones that score 0.

This relies on the fact that there is at least one entry with non-zero
score, which in practice is true, but without even a comment?  I feel
dirty.

The score of the first non-prefix entry is in best_similarity and that
entry is at main_cmds.names[n-1] at this point.  You haven't checked
main_cmds.names[n] yet...

> +	n++;

... but you increment n to skip that entry without even looking, and then
go on to ...

> +	while (n < main_cmds.cnt && best_similarity >= main_cmds.names[n]->len)
>  		++n;

You skip the entries with the same similarity as the closest typo,
presumably to point n to the first entry that is irrelevant (i.e. 0 thru n
but not including n are candidates).

Your rewrite of the loop makes it very hard to read and spot bugs, I
think.

>  	if (autocorrect && n == 1 && SIMILAR_ENOUGH(best_similarity)) {
>  		const char *assumed = main_cmds.names[0]->name;
> -- 
> 1.7.3.2
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html