Re: [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2

Jeff King <peff@xxxxxxxx> · Thu, 11 Nov 2021 10:18:05 -0500

On Thu, Nov 11, 2021 at 06:18:55AM +0100, Ævar Arnfjörð Bjarmason wrote:

> diff --git a/object-file.c b/object-file.c
> index 02b79702748..ac476653a06 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2528,6 +2528,8 @@ int read_loose_object(const char *path,
>  	char hdr[MAX_HEADER_LEN];
>  	unsigned long *size = oi->sizep;
>  
> +	*contents = NULL;
> +
>  	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
>  	if (!map) {
>  		error_errno(_("unable to mmap %s"), path);

OK, I agree this fixes the segfault, and is the minimal fix.

I do find the fact that fsck_loose() looks at "contents" after
read_loose_object() returns an error to be a bit questionable. It's a
recipe for confusion about what has happened, and who is supposed to
free what.  Your v2 addresses the leak, but by just shifting more burden
to the caller. There's only one caller, so it's not too bad, but for a
public function, read_loose_object() has a lot of sharp edges.

Plus I think it fails to work as intended for streaming blobs (we do not
fill in "contents" at all in that case, so we can never say "hash-path
mismatch").

I understand you're trying to catch the case of "we actually opened the
file and computed the sha1 of its contents" from cases where we didn't
get that far. But since you initialize real_oid, it seems like it would
be better to see if anything was written to that.

I.e., something like:

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d87c28a1cc..8f156ed9cd 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -617,18 +617,20 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	oi.typep = &type;
 
 	if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
-		if (contents && !oideq(&real_oid, oid))
+		if (!is_null_oid(&real_oid) && !oideq(&real_oid, oid))
 			err = error(_("%s: hash-path mismatch, found at: %s"),
 				    oid_to_hex(&real_oid), path);
 		else
 			err = error(_("%s: object corrupt or missing: %s"),
 				    oid_to_hex(oid), path);
+		errors_found |= ERROR_OBJECT;
+		return 0; /* keep checking other objects */
 	}
-	if (type != OBJ_NONE && type < 0)
+	if (type != OBJ_NONE && type < 0) {
 		err = error(_("%s: object is of unknown type '%s': %s"),
 			    oid_to_hex(&real_oid), cb_data->obj_type.buf,
 			    path);
-	if (err < 0) {
+		free(contents);
 		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}

(the "err" variable is now superfluous, but I left it in to keep the
diff smaller). And then it would be safe to just set "contents" in
read_loose_object() when we need it:

diff --git a/object-file.c b/object-file.c
index ac476653a0..5e8ff94fd4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2528,8 +2528,6 @@ int read_loose_object(const char *path,
 	char hdr[MAX_HEADER_LEN];
 	unsigned long *size = oi->sizep;
 
-	*contents = NULL;
-
 	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
 	if (!map) {
 		error_errno(_("unable to mmap %s"), path);
@@ -2549,6 +2547,7 @@ int read_loose_object(const char *path,
 	}
 
 	if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
+		*contents = NULL;
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
 			goto out;
 	} else {

That doesn't fix the hash-path mismatch problem for streaming, but it
sets us up to do so, if check_stream_oid() returned the real_oid it
computed.

All of this is much too large for an -rc fix, so we should take your
patch as-is. These are just thoughts I had while trying to figure out
if there were other problems caused by that same commit.

-Peff