Re: [PATCH v3 1/4] revision: improve commit_rewrite_person()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Siddarth,

On Sat, 9 Jul 2022, Siddharth Asthana wrote:

> diff --git a/revision.c b/revision.c
> index 211352795c..1939c56c67 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3792,14 +3791,42 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
>  			      ident.mail_end - ident.name_begin + 1,
>  			      namemail.buf, namemail.len);
>
> +		newlen = namemail.len;
> +
>  		strbuf_release(&namemail);
>
> -		return 1;
> +		return newlen - (ident.mail_end - ident.name_begin + 1);
>  	}
>
>  	return 0;
>  }
>
> +static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
> +{
> +	size_t buf_offset = 0;
> +
> +	if (!mailmap)
> +		return;
> +
> +	for (;;) {
> +		const char *person, *line;
> +		size_t i, linelen;
> +
> +		line = buf->buf + buf_offset;
> +		linelen = strchrnul(line, '\n') - line + 1;
> +
> +		if (linelen <= 1)
> +			/* End of header */
> +			return;

This conditional would probably read much better if it was moved up a few
lines and rewritten like this:

	if (!*line || *line == '\n')
		return; /* End of headers */

or even turning the `for (;;)` into

	while (buf->buf[buf_offset] && buf->buf[buf_offset] != '\n')

> +
> +		buf_offset += linelen;

I would prefer to avoid having `linelen` altogether, and instead move the
`buf_offset` assignment _after_ the loop that handles the current header
line (and _not_ modify `buf_offset` inside):

	buf_offset = strchrnul(buf->buf + buf_offset, '\n');
	if (buf->buf[buf_offset] == '\n')
		buf_offset++;

> +
> +		for (i = 0; headers[i]; i++)
> +			if (skip_prefix(line, headers[i], &person))
> +				buf_offset += rewrite_ident_line(person, buf, mailmap);

At this point, we have handled the header and should _not_ continue the
(inner) `for` loop. This is important because `line` is potentially
invalidated by that `rewrite_ident_line()` call. See below for a patch
(which is on top of `shears/seen`, but you get the idea.

This issue could also be avoided by consistently using `buf->buf +
buf_offset` instead of `line`.

> +	}
> +}
> +
>  static int commit_match(struct commit *commit, struct rev_info *opt)
>  {
>  	int retval;

-- snipsnap --
>From 61dd169def195eee9827a9a670f8dab606279cea Mon Sep 17 00:00:00 2001
From: Johannes Schindelin <johannes.schindelin@xxxxxx>
Date: Tue, 12 Jul 2022 15:10:35 +0200
Subject: [PATCH] fixup??? revision: improve commit_rewrite_person()

When the `linux-musl` job failed in t4203.44 with a segmentation fault,
I became suspicious. From
https://github.com/git-for-windows/git/runs/7301741954?check_suite_focus=true#step:5:1775:

  + test_config mailmap.file complex.map
  + config_dir=
  + test mailmap.file '=' -C
  + test_when_finished 'test_unconfig  '"'"'mailmap.file'"'"
  + test 0 '=' 0
  + test_cleanup='{ test_unconfig  '"'"'mailmap.file'"'"'
  		} && (exit "$eval_ret"); eval_ret=$?; :'
  + git config mailmap.file complex.map
  + git log --use-mailmap --author '<cto@xxxxxxxxxxx>'
  Segmentation fault (core dumped)
  error: last command exited with $?=139

I suspected a memory corruption, and my go-to tool to investigate those
is `valgrind`, so I ran `t4203-*.sh -ivx --valgrind-only=44`. It
reported the following:

-- snip --
[...]
expecting success of 4203.44 'Only grep replaced author with --use-mailmap':
	test_config mailmap.file complex.map &&
	git log --use-mailmap --author "<cto@xxxxxxxxxxx>" >actual &&
	test_must_be_empty actual

+ test_config mailmap.file complex.map
+ config_dir=
+ test mailmap.file = -C
+ test_when_finished test_unconfig  'mailmap.file'
+ test 0 = 0
+ test_cleanup={ test_unconfig  'mailmap.file'
		} && (exit "$eval_ret"); eval_ret=$?; :
+ git config mailmap.file complex.map
+ git log --use-mailmap --author <cto@xxxxxxxxxxx>
==14374== Invalid read of size 1
==14374==    at 0x2EE384: skip_prefix (git-compat-util.h:676)
==14374==    by 0x2EF31D: apply_mailmap_to_header (ident.c:417)
==14374==    by 0x3BB045: commit_match (revision.c:3831)
==14374==    by 0x3BB389: get_commit_action (revision.c:3917)
==14374==    by 0x3BB934: simplify_commit (revision.c:4005)
==14374==    by 0x3BBCAD: get_revision_1 (revision.c:4083)
==14374==    by 0x3BBEF0: get_revision_internal (revision.c:4184)
==14374==    by 0x3BC192: get_revision (revision.c:4262)
==14374==    by 0x1A0B05: cmd_log_walk_no_free (log.c:454)
==14374==    by 0x1A0BCD: cmd_log_walk (log.c:496)
==14374==    by 0x1A1E90: cmd_log (log.c:818)
==14374==    by 0x129A04: run_builtin (git.c:466)
==14374==  Address 0x4b76f4e is 94 bytes inside a block of size 210 free'd
==14374==    at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==14374==    by 0x42F908: xrealloc (wrapper.c:136)
==14374==    by 0x3E6742: strbuf_grow (strbuf.c:99)
==14374==    by 0x3E6F1F: strbuf_splice (strbuf.c:242)
==14374==    by 0x2EF220: rewrite_ident_line (ident.c:382)
==14374==    by 0x2EF338: apply_mailmap_to_header (ident.c:418)
==14374==    by 0x3BB045: commit_match (revision.c:3831)
==14374==    by 0x3BB389: get_commit_action (revision.c:3917)
==14374==    by 0x3BB934: simplify_commit (revision.c:4005)
==14374==    by 0x3BBCAD: get_revision_1 (revision.c:4083)
==14374==    by 0x3BBEF0: get_revision_internal (revision.c:4184)
==14374==    by 0x3BC192: get_revision (revision.c:4262)
==14374==  Block was alloc'd at
==14374==    at 0x483B723: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==14374==    by 0x483E017: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==14374==    by 0x42F908: xrealloc (wrapper.c:136)
==14374==    by 0x3E6742: strbuf_grow (strbuf.c:99)
==14374==    by 0x3E733D: strbuf_add (strbuf.c:298)
==14374==    by 0x3AF420: strbuf_addstr (strbuf.h:305)
==14374==    by 0x3BB027: commit_match (revision.c:3829)
==14374==    by 0x3BB389: get_commit_action (revision.c:3917)
==14374==    by 0x3BB934: simplify_commit (revision.c:4005)
==14374==    by 0x3BBCAD: get_revision_1 (revision.c:4083)
==14374==    by 0x3BBEF0: get_revision_internal (revision.c:4184)
==14374==    by 0x3BC192: get_revision (revision.c:4262)
==14374==
{
   <insert_a_suppression_name_here>
   Memcheck:Addr1
   fun:skip_prefix
   fun:apply_mailmap_to_header
   fun:commit_match
   fun:get_commit_action
   fun:simplify_commit
   fun:get_revision_1
   fun:get_revision_internal
   fun:get_revision
   fun:cmd_log_walk_no_free
   fun:cmd_log_walk
   fun:cmd_log
   fun:run_builtin
}
error: last command exited with $?=126
not ok 44 - Only grep replaced author with --use-mailmap
1..44
-- snap --

And indeed, we see that the `rewrite_ident_line()` function grows the
strbuf, which can (and does, in this instance) move the buffer to a new
address, which invalidates the `line` pointer, which still points at the
old address.

It might actually make sense to rewrite the entire part of the original
patch where it looks for the end of the header line, so that it avoids
working on pointers altogether, and uses offsets instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
---
 ident.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ident.c b/ident.c
index 5f17bd607dd..fbcf7250aab 100644
--- a/ident.c
+++ b/ident.c
@@ -414,8 +414,10 @@ void apply_mailmap_to_header(struct strbuf *buf, const char **headers, struct st
 		buf_offset += linelen;

 		for (i = 0; headers[i]; i++)
-			if (skip_prefix(line, headers[i], &person))
+			if (skip_prefix(line, headers[i], &person)) {
 				buf_offset += rewrite_ident_line(person, buf, mailmap);
+				break;
+			}
 	}
 }

--
2.37.0.rc2.windows.1.7.g45a475aeb84





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux