Hi Siddarth, On Sat, 9 Jul 2022, Siddharth Asthana wrote: > diff --git a/revision.c b/revision.c > index 211352795c..1939c56c67 100644 > --- a/revision.c > +++ b/revision.c > @@ -3792,14 +3791,42 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st > ident.mail_end - ident.name_begin + 1, > namemail.buf, namemail.len); > > + newlen = namemail.len; > + > strbuf_release(&namemail); > > - return 1; > + return newlen - (ident.mail_end - ident.name_begin + 1); > } > > return 0; > } > > +static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap) > +{ > + size_t buf_offset = 0; > + > + if (!mailmap) > + return; > + > + for (;;) { > + const char *person, *line; > + size_t i, linelen; > + > + line = buf->buf + buf_offset; > + linelen = strchrnul(line, '\n') - line + 1; > + > + if (linelen <= 1) > + /* End of header */ > + return; This conditional would probably read much better if it was moved up a few lines and rewritten like this: if (!*line || *line == '\n') return; /* End of headers */ or even turning the `for (;;)` into while (buf->buf[buf_offset] && buf->buf[buf_offset] != '\n') > + > + buf_offset += linelen; I would prefer to avoid having `linelen` altogether, and instead move the `buf_offset` assignment _after_ the loop that handles the current header line (and _not_ modify `buf_offset` inside): buf_offset = strchrnul(buf->buf + buf_offset, '\n'); if (buf->buf[buf_offset] == '\n') buf_offset++; > + > + for (i = 0; headers[i]; i++) > + if (skip_prefix(line, headers[i], &person)) > + buf_offset += rewrite_ident_line(person, buf, mailmap); At this point, we have handled the header and should _not_ continue the (inner) `for` loop. This is important because `line` is potentially invalidated by that `rewrite_ident_line()` call. See below for a patch (which is on top of `shears/seen`, but you get the idea. This issue could also be avoided by consistently using `buf->buf + buf_offset` instead of `line`. > + } > +} > + > static int commit_match(struct commit *commit, struct rev_info *opt) > { > int retval; -- snipsnap -- >From 61dd169def195eee9827a9a670f8dab606279cea Mon Sep 17 00:00:00 2001 From: Johannes Schindelin <johannes.schindelin@xxxxxx> Date: Tue, 12 Jul 2022 15:10:35 +0200 Subject: [PATCH] fixup??? revision: improve commit_rewrite_person() When the `linux-musl` job failed in t4203.44 with a segmentation fault, I became suspicious. From https://github.com/git-for-windows/git/runs/7301741954?check_suite_focus=true#step:5:1775: + test_config mailmap.file complex.map + config_dir= + test mailmap.file '=' -C + test_when_finished 'test_unconfig '"'"'mailmap.file'"'" + test 0 '=' 0 + test_cleanup='{ test_unconfig '"'"'mailmap.file'"'"' } && (exit "$eval_ret"); eval_ret=$?; :' + git config mailmap.file complex.map + git log --use-mailmap --author '<cto@xxxxxxxxxxx>' Segmentation fault (core dumped) error: last command exited with $?=139 I suspected a memory corruption, and my go-to tool to investigate those is `valgrind`, so I ran `t4203-*.sh -ivx --valgrind-only=44`. It reported the following: -- snip -- [...] expecting success of 4203.44 'Only grep replaced author with --use-mailmap': test_config mailmap.file complex.map && git log --use-mailmap --author "<cto@xxxxxxxxxxx>" >actual && test_must_be_empty actual + test_config mailmap.file complex.map + config_dir= + test mailmap.file = -C + test_when_finished test_unconfig 'mailmap.file' + test 0 = 0 + test_cleanup={ test_unconfig 'mailmap.file' } && (exit "$eval_ret"); eval_ret=$?; : + git config mailmap.file complex.map + git log --use-mailmap --author <cto@xxxxxxxxxxx> ==14374== Invalid read of size 1 ==14374== at 0x2EE384: skip_prefix (git-compat-util.h:676) ==14374== by 0x2EF31D: apply_mailmap_to_header (ident.c:417) ==14374== by 0x3BB045: commit_match (revision.c:3831) ==14374== by 0x3BB389: get_commit_action (revision.c:3917) ==14374== by 0x3BB934: simplify_commit (revision.c:4005) ==14374== by 0x3BBCAD: get_revision_1 (revision.c:4083) ==14374== by 0x3BBEF0: get_revision_internal (revision.c:4184) ==14374== by 0x3BC192: get_revision (revision.c:4262) ==14374== by 0x1A0B05: cmd_log_walk_no_free (log.c:454) ==14374== by 0x1A0BCD: cmd_log_walk (log.c:496) ==14374== by 0x1A1E90: cmd_log (log.c:818) ==14374== by 0x129A04: run_builtin (git.c:466) ==14374== Address 0x4b76f4e is 94 bytes inside a block of size 210 free'd ==14374== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==14374== by 0x42F908: xrealloc (wrapper.c:136) ==14374== by 0x3E6742: strbuf_grow (strbuf.c:99) ==14374== by 0x3E6F1F: strbuf_splice (strbuf.c:242) ==14374== by 0x2EF220: rewrite_ident_line (ident.c:382) ==14374== by 0x2EF338: apply_mailmap_to_header (ident.c:418) ==14374== by 0x3BB045: commit_match (revision.c:3831) ==14374== by 0x3BB389: get_commit_action (revision.c:3917) ==14374== by 0x3BB934: simplify_commit (revision.c:4005) ==14374== by 0x3BBCAD: get_revision_1 (revision.c:4083) ==14374== by 0x3BBEF0: get_revision_internal (revision.c:4184) ==14374== by 0x3BC192: get_revision (revision.c:4262) ==14374== Block was alloc'd at ==14374== at 0x483B723: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==14374== by 0x483E017: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==14374== by 0x42F908: xrealloc (wrapper.c:136) ==14374== by 0x3E6742: strbuf_grow (strbuf.c:99) ==14374== by 0x3E733D: strbuf_add (strbuf.c:298) ==14374== by 0x3AF420: strbuf_addstr (strbuf.h:305) ==14374== by 0x3BB027: commit_match (revision.c:3829) ==14374== by 0x3BB389: get_commit_action (revision.c:3917) ==14374== by 0x3BB934: simplify_commit (revision.c:4005) ==14374== by 0x3BBCAD: get_revision_1 (revision.c:4083) ==14374== by 0x3BBEF0: get_revision_internal (revision.c:4184) ==14374== by 0x3BC192: get_revision (revision.c:4262) ==14374== { <insert_a_suppression_name_here> Memcheck:Addr1 fun:skip_prefix fun:apply_mailmap_to_header fun:commit_match fun:get_commit_action fun:simplify_commit fun:get_revision_1 fun:get_revision_internal fun:get_revision fun:cmd_log_walk_no_free fun:cmd_log_walk fun:cmd_log fun:run_builtin } error: last command exited with $?=126 not ok 44 - Only grep replaced author with --use-mailmap 1..44 -- snap -- And indeed, we see that the `rewrite_ident_line()` function grows the strbuf, which can (and does, in this instance) move the buffer to a new address, which invalidates the `line` pointer, which still points at the old address. It might actually make sense to rewrite the entire part of the original patch where it looks for the end of the header line, so that it avoids working on pointers altogether, and uses offsets instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> --- ident.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ident.c b/ident.c index 5f17bd607dd..fbcf7250aab 100644 --- a/ident.c +++ b/ident.c @@ -414,8 +414,10 @@ void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct st buf_offset += linelen; for (i = 0; headers[i]; i++) - if (skip_prefix(line, headers[i], &person)) + if (skip_prefix(line, headers[i], &person)) { buf_offset += rewrite_ident_line(person, buf, mailmap); + break; + } } } -- 2.37.0.rc2.windows.1.7.g45a475aeb84