On Fri, Jul 22 2022, Taylor Blau wrote: [I think John Cai would appreciate being CC'd on this] > It's tempting to think that we could use `strbuf_getwholeline()` and > specify either `\n` or `\0` as the terminating character. But for input > on platforms that include a CR character preceeding the LF, this > wouldn't quite be the same, since `strbuf_getline(...)` will trim any > trailing CR, while `strbuf_getwholeline(&buf, stdin, '\n')` will not. I commend the effort to maintain bug-for-bug compatibility, but do we care about this distinction, or is it just an accident that we support \r\n here in the first place? This doesn't apply to the rest of cat-file directly, but the origin of the recent --batch-command mode cdoe was to lift the same-ish code from builtin/update-ref.c, whose \n or \0 mode does exactly that: while (!strbuf_getwholeline(&input, stdin, line_termination)) { I.e. it doesn't support \r\n, just \n or \0. Isn't that fine? I may be missing something, but why isn't it OK even on platforms that use \r\n for their normal *.txt endings to only use \n or \0 for this bit of protocol? For the command mode at least this passes our tests: diff --git a/builtin/cat-file.c b/builtin/cat-file.c index f42782e955f..8646059472d 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -614,12 +614,16 @@ static void batch_objects_command(struct batch_options *opt, struct queued_cmd *queued_cmd = NULL; size_t alloc = 0, nr = 0; - while (!strbuf_getline(&input, stdin)) { + while (!strbuf_getwholeline(&input, stdin, '\n')) { int i; const struct parse_cmd *cmd = NULL; const char *p = NULL, *cmd_end; struct queued_cmd call = {0}; + if (input.len > 0 && input.buf[input.len - 1] == '\n') + --input.len; + input.buf[input.len] = '\0'; + if (!input.len) die(_("empty command in input")); if (isspace(*input.buf)) So maybe we should just do something like that instead? I.e. declare that a mistake. As for the rest of cat-file 05d5667fec9 (git-cat-file: Add --batch-check option, 2008-04-23) documents that it's LF, not CR LF, ditto git-cat-file.txt. So isn't this just an accident in of us having used the strbuf_getline() function to mean "\n", but actually it also does "\r\n". Which is a really unfortunately named function b.t.w., since it sneaks this bit of Windows portability into places that may not want it in the first place. > strlen () { > echo_without_newline "$1" | wc -c | sed -e 's/^ *//' > } > @@ -398,6 +403,12 @@ test_expect_success '--batch with multiple sha1s gives correct format' ' > test "$(maybe_remove_timestamp "$batch_output" 1)" = "$(maybe_remove_timestamp "$(echo_without_newline "$batch_input" | git cat-file --batch)" 1)" > ' > > +test_expect_success '--batch, -z with multiple sha1s gives correct format' ' > + echo_without_newline_nul "$batch_input" >in && > + test "$(maybe_remove_timestamp "$batch_output" 1)" = \ > + "$(maybe_remove_timestamp "$(git cat-file --batch -z <in)" 1)" This... > +' > + > batch_check_input="$hello_sha1 > $tree_sha1 > $commit_sha1 > @@ -418,6 +429,24 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" ' > "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)" > ' > > +test_expect_success "--batch-check, -z with multiple sha1s gives correct format" ' > + echo_without_newline_nul "$batch_check_input" >in && > + test "$batch_check_output" = "$(git cat-file --batch-check -z <in)" This.... > +' > + > +test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' ' > + touch -- "newline${LF}embedded" && > + git add -- "newline${LF}embedded" && > + git commit -m "file with newline embedded" && > + test_tick && > + > + printf "HEAD:newline${LF}embedded" >in && > + git cat-file --batch-check -z <in >actual && > + > + echo "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect && ..and this hides git's exit code, better to pipe to a file, use test_cmp etc. etc.