Re: [PATCH v7 2/2] name-rev.c: use strbuf_getline instead of limited size buffer

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 18 Jan 2022 17:09:37 +0100

On Wed, Jan 05 2022, John Cai via GitGitGadget wrote:

> From: John Cai <johncai86@xxxxxxxxx>
>
> Using a buffer limited to 2048 is unnecessarily limiting. Switch to
> using a string buffer to read in stdin for annotation.
>
> Signed-off-by: "John Cai" <johncai86@xxxxxxxxx>
> ---
>  builtin/name-rev.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 8baf5b52d0b..138e3c30a2b 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -623,14 +623,13 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
>  	name_tips();
>  
>  	if (annotate_stdin) {
> -		char buffer[2048];
> +		struct strbuf sb = STRBUF_INIT;
>  
> -		while (!feof(stdin)) {
> -			char *p = fgets(buffer, sizeof(buffer), stdin);
> -			if (!p)
> -				break;
> -			name_rev_line(p, &data);
> +		while (strbuf_getline(&sb, stdin) != EOF) {
> +			strbuf_addch(&sb, '\n');
> +			name_rev_line(sb.buf, &data);
>  		}
> +		strbuf_release(&sb);
>  	} else if (all) {
>  		int i, max;

Maybe there's a subtlety with \r in newlines (Windows), but isn't this
doing the same thing as:

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 138e3c30a2b..03dbf251450 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -625,10 +625,8 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
 	if (annotate_stdin) {
 		struct strbuf sb = STRBUF_INIT;
 
-		while (strbuf_getline(&sb, stdin) != EOF) {
-			strbuf_addch(&sb, '\n');
+		while (!strbuf_getwholeline(&sb, stdin, '\n'))
 			name_rev_line(sb.buf, &data);
-		}
 		strbuf_release(&sb);
 	} else if (all) {
 		int i, max;

After writing that I see this was changed on the basis of Junio's
feedback in https://lore.kernel.org/git/xmqqr19ofdo5.fsf@gitster.g/ :)

FWIW I think it's fine as-is, but it also seems that name_rev_line()
really doesn't care about lines per-se, but just that we don't split
OIDs across "lines" (as we'll tokenize get_oid() on them). So
e.g. splitting by ' ' lines (spaces) also works here, but not 'a' (as
that would split a [0-9a-f].