Re: [RFC PATCH 0/5] strvec: add a "nodup" mode, fix memory leaks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 15.12.22 um 10:11 schrieb Ævar Arnfjörð Bjarmason:
> This is an alternative to René's [1], his already fixes a leak in "git
> am", and this could be done later, so I'm submitting it as RFC, but it
> could also replace it.
>
> I think as this series shows extending the "strvec" API to get a
> feature that works like the existing "strdup_strings" that the "struct
> string_list" has can make memory management much simpler.
>
> The 4/5 here shows cases where we were leaking because our "v" was
> clobbered, but where all the strings we were pushing to the "strvec"
> were fixed strings, so we could skip xstrdup()-ing them.
>
> The 5/5 then shows more complex cases where we have mixed-use,
> i.e. most strings are fixed, but some are not. For those we use a
> "struct string_list to_free = STRING_LIST_INIT_DUP", which we then
> push to the "to_free" list with "string_list_append_nodup()".
>
> This does make the API slightly more dangerous to use, as it's no
> longer guaranteed that it owns all the members it points to. But as
> the "struct string_list" has shown this isn't an issue in practice,
> and e.g. SANITIZE=address et al are good about finding double-frees,
> or frees of fixed strings.
>
> A branch & CI for this are found at [2].
>
> 1. https://lore.kernel.org/git/baf93e4a-7f05-857c-e551-09675496c03c@xxxxxx/
> 2. https://github.com/avar/git/tree/avar/add-and-use-strvec-init-nodup
>
> Ævar Arnfjörð Bjarmason (5):
>   builtin/annotate.c: simplify for strvec API
>   various: add missing strvec_clear()
>   strvec API: add a "STRVEC_INIT_NODUP"
>   strvec API users: fix leaks by using "STRVEC_INIT_NODUP"
>   strvec API users: fix more leaks by using "STRVEC_INIT_NODUP"
>
>  builtin/am.c                  |  2 +-
>  builtin/annotate.c            | 17 ++++++++---------
>  builtin/describe.c            | 28 +++++++++++++++++++++-------
>  builtin/stash.c               |  8 ++++++--
>  builtin/upload-archive.c      | 16 ++++++++++++----
>  strvec.c                      | 20 ++++++++++++++++++--
>  strvec.h                      | 30 +++++++++++++++++++++++++++++-
>  t/t0023-crlf-am.sh            |  1 +
>  t/t4152-am-subjects.sh        |  2 ++
>  t/t4254-am-corrupt.sh         |  2 ++
>  t/t4256-am-format-flowed.sh   |  1 +
>  t/t4257-am-interactive.sh     |  2 ++
>  t/t5003-archive-zip.sh        |  1 +
>  t/t5403-post-checkout-hook.sh |  1 +
>  14 files changed, 105 insertions(+), 26 deletions(-)
>

This complicates strvec to gain functionality that we basically already
have with REALLOC_ARRAY.  It allows for nice conversions in some cases
(patch 4), but places that require some kind of double accounting become
quite nasty (patch 5).  I'd prefer to keep strvec simple.  Avoiding
allocations isn't necessary because the number of options to parse is
low -- we just need to keep track of and release them at the end.

The problem here is that parse_options() takes a string list in the form
of an int paired with a const char ** and modifies this list in place.
That's fine if we don't care about the memory ownership of the list and
its elements, e.g. because we get it from the OS (main() style) or
accept leakage.

It's not compatible with the list being a strvec that we don't want to
leak, though.  What we need is something that translates from strvec to
the argc/argv convention.  If we allow leaks this is trivial: Just use
the .v member directly.  If we don't then it is still simple: Make a
shallow copy of the .v member, release it after use.  Demo patch below.

 builtin/am.c             |  5 ++++-
 builtin/annotate.c       |  9 ++++++++-
 builtin/describe.c       |  8 +++++++-
 builtin/stash.c          | 10 +++++++++-
 builtin/upload-archive.c | 11 +++++++++--
 strvec.c                 |  9 +++++++++
 strvec.h                 |  7 +++++++
 7 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 30c9b3a9cd..a80d40f4be 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1476,14 +1476,16 @@ static int run_apply(const struct am_state *state, const char *index_file)
 	int res, opts_left;
 	int force_apply = 0;
 	int options = 0;
+	const char **apply_argv;

 	if (init_apply_state(&apply_state, the_repository, NULL))
 		BUG("init_apply_state() failed");

 	strvec_push(&apply_opts, "apply");
 	strvec_pushv(&apply_opts, state->git_apply_opts.v);
+	apply_argv = strvec_clone_nodup(&apply_opts);

-	opts_left = apply_parse_options(apply_opts.nr, apply_opts.v,
+	opts_left = apply_parse_options(apply_opts.nr, apply_argv,
 					&apply_state, &force_apply, &options,
 					NULL);

@@ -1513,6 +1515,7 @@ static int run_apply(const struct am_state *state, const char *index_file)
 	strvec_clear(&apply_paths);
 	strvec_clear(&apply_opts);
 	clear_apply_state(&apply_state);
+	free(apply_argv);

 	if (res)
 		return res;
diff --git a/builtin/annotate.c b/builtin/annotate.c
index 58ff977a23..7e75f0f48d 100644
--- a/builtin/annotate.c
+++ b/builtin/annotate.c
@@ -10,6 +10,8 @@
 int cmd_annotate(int argc, const char **argv, const char *prefix)
 {
 	struct strvec args = STRVEC_INIT;
+	const char **blame_argv;
+	int ret;
 	int i;

 	strvec_pushl(&args, "annotate", "-c", NULL);
@@ -17,6 +19,11 @@ int cmd_annotate(int argc, const char **argv, const char *prefix)
 	for (i = 1; i < argc; i++) {
 		strvec_push(&args, argv[i]);
 	}
+	blame_argv = strvec_clone_nodup(&args);

-	return cmd_blame(args.nr, args.v, prefix);
+	ret = cmd_blame(args.nr, blame_argv, prefix);
+
+	strvec_clear(&args);
+	free(blame_argv);
+	return ret;
 }
diff --git a/builtin/describe.c b/builtin/describe.c
index eea1e330c0..2ef2fbc0d4 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -598,6 +598,8 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 	if (contains) {
 		struct string_list_item *item;
 		struct strvec args;
+		const char **name_rev_argv;
+		int ret;

 		strvec_init(&args);
 		strvec_pushl(&args, "name-rev",
@@ -616,7 +618,11 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 			strvec_pushv(&args, argv);
 		else
 			strvec_push(&args, "HEAD");
-		return cmd_name_rev(args.nr, args.v, prefix);
+		name_rev_argv = strvec_clone_nodup(&args);
+		ret = cmd_name_rev(args.nr, name_rev_argv, prefix);
+		strvec_clear(&args);
+		free(name_rev_argv);
+		return ret;
 	}

 	hashmap_init(&names, commit_name_neq, NULL, 0);
diff --git a/builtin/stash.c b/builtin/stash.c
index bb0fd86143..864b7c573a 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1820,6 +1820,8 @@ static int save_stash(int argc, const char **argv, const char *prefix)

 int cmd_stash(int argc, const char **argv, const char *prefix)
 {
+	int ret;
+	const char **push_stash_argv;
 	pid_t pid = getpid();
 	const char *index_file;
 	struct strvec args = STRVEC_INIT;
@@ -1861,5 +1863,11 @@ int cmd_stash(int argc, const char **argv, const char *prefix)
 	/* Assume 'stash push' */
 	strvec_push(&args, "push");
 	strvec_pushv(&args, argv);
-	return !!push_stash(args.nr, args.v, prefix, 1);
+	push_stash_argv = strvec_clone_nodup(&args);
+
+	ret = !!push_stash(args.nr, push_stash_argv, prefix, 1);
+
+	strvec_clear(&args);
+	free(push_stash_argv);
+	return ret;
 }
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 945ee2b412..36ed85e10a 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -19,6 +19,8 @@ static const char deadchild[] =

 int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
 {
+	int ret;
+	const char **upload_archive_argv;
 	struct strvec sent_argv = STRVEC_INIT;
 	const char *arg_cmd = "argument ";

@@ -43,10 +45,15 @@ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix)
 			die("'argument' token or flush expected");
 		strvec_push(&sent_argv, buf + strlen(arg_cmd));
 	}
+	upload_archive_argv = strvec_clone_nodup(&sent_argv);

 	/* parse all options sent by the client */
-	return write_archive(sent_argv.nr, sent_argv.v, prefix,
-			     the_repository, NULL, 1);
+	ret = write_archive(sent_argv.nr, upload_archive_argv, prefix,
+			    the_repository, NULL, 1);
+
+	strvec_clear(&sent_argv);
+	free(upload_archive_argv);
+	return ret;
 }

 __attribute__((format (printf, 1, 2)))
diff --git a/strvec.c b/strvec.c
index 61a76ce6cb..5c41c8c506 100644
--- a/strvec.c
+++ b/strvec.c
@@ -106,3 +106,12 @@ const char **strvec_detach(struct strvec *array)
 		return ret;
 	}
 }
+
+const char **strvec_clone_nodup(const struct strvec *sv)
+{
+	const char **ret;
+
+	CALLOC_ARRAY(ret, sv->nr + 1);
+	COPY_ARRAY(ret, sv->v, sv->nr);
+	return ret;
+}
diff --git a/strvec.h b/strvec.h
index 9f55c8766b..6234634b00 100644
--- a/strvec.h
+++ b/strvec.h
@@ -88,4 +88,11 @@ void strvec_clear(struct strvec *);
  */
 const char **strvec_detach(struct strvec *);

+/**
+ * Create a copy of the array that references the original strings.
+ * The caller is responsible for freeing the memory of that copy,
+ * but must not free the individual strings.
+ */
+const char **strvec_clone_nodup(const struct strvec *);
+
 #endif /* STRVEC_H */




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux