Re: RFC - concurrency causes segfault in git grep since 2.26.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Phil

Thanks for the bug report and for the crash logs.

On Thu, Sep 24, 2020 at 11:36 PM Phil Hord <phil.hord@xxxxxxxxx> wrote:
>
> Hi list.  It's been a while.
>
> I can reliably reproduce a segfault in git grep when searching the
> object DB. It is caused by a race when threads > 2.
>
> I have a patch that fixes the problem, but I don't know exactly why.
> Can someone else explain it and/or offer a better solution?
>
> ====
>
> diff --git packfile.c packfile.c
> index e69012e7f2..52b7b54aeb 100644
> --- packfile.c
> +++ packfile.c
> @@ -1812,7 +1812,9 @@ void *unpack_entry(struct repository *r, struct
> packed_git *p, off_t obj_offset,
>                 if (!base)
>                         continue;
>
> +               obj_read_lock();
>                 delta_data = unpack_compressed_entry(p, &w_curs,
> curpos, delta_size);
> +               obj_read_unlock();
>
>                 if (!delta_data) {
>                         error("failed to unpack compressed delta "
>
> ====

Hmm, obj_read_mutex is a recursive lock, so by adding the
obj_read_lock() call here, we are incrementing the lock value to [at
least] 2 (because oid_object_info_extended() had already incremented
once). Therefore, when unpack_compressed_entry() calls obj_read_unlock()
before inflating the entry, the lock is not actually released, only
decremented. So the effect of this change is that the phase III of
unpack_entry() is performed entirely without releasing the lock.

Your crash log shows us that the segfault happened when trying to
memcpy() the string `base` (in unpack_entry()). I.e., the same string
that we had previously added to the delta base cache, right before
calling unpack_compressed_entry() and releasing the lock. The
problematic sequence is:
	add `base` to the cache -> release lock -> inflate ->
	acquire lock -> try to use `base`

Maybe, when a thread X releases the lock to perform decompression,
thread Y acquires the lock and tries to add another base to the cache.
The cache is full, so Y has to remove entries, and it ends up free()'ing
the base that was just inserted by X. Later, X tries to dereference
`base` in patch_entry(), which would cause a segfault. It would also
explain why your change solves the segfault.

I'm not sure, though, why this entry would be removed, since it was just
added... Maybe Y was adding a huge base to the cache, which required
removing all entries?

Anyways, you mentioned you can reproduce the failure quite reliably in your
repo with 20 threads, right? Could you please check with the following patch
applied? It adds a `base` copy to the cache instead of the original string,
allowing us to keep running decompression in parallel but without the risk of
having another thread free()'ing `base` in the mean time.

diff --git a/packfile.c b/packfile.c
index 9ef27508f2..79f83b6034 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1772,14 +1772,15 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 	while (delta_stack_nr) {
 		void *delta_data;
 		void *base = data;
-		void *external_base = NULL;
 		unsigned long delta_size, base_size = size;
 		int i;
 
 		data = NULL;
 
-		if (base)
-			add_delta_base_cache(p, obj_offset, base, base_size, type);
+		if (base) {
+			char *base_dup = memcpy(xmalloc(base_size), base, base_size);
+			add_delta_base_cache(p, obj_offset, base_dup, base_size, type);
+		}
 
 		if (!base) {
 			/*
@@ -1799,7 +1800,6 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 				      p->pack_name);
 				mark_bad_packed_object(p, base_oid.hash);
 				base = read_object(r, &base_oid, &type, &base_size);
-				external_base = base;
 			}
 		}
 
@@ -1818,7 +1818,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			      "at offset %"PRIuMAX" from %s",
 			      (uintmax_t)curpos, p->pack_name);
 			data = NULL;
-			free(external_base);
+			free(base);
 			continue;
 		}
 
@@ -1838,7 +1838,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 			error("failed to apply delta");
 
 		free(delta_data);
-		free(external_base);
+		free(base);
 	}
 
 	if (final_type)
--



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux