Re: [BUG] t9801 and t9803 broken on next

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 17, 2016 at 10:07:16AM +0200, Lars Schneider wrote:

> I think that is pretty much the problem. Here is what is happening:
> 
> 1.  git-p4 imports all changelists for the "main" branch
> 
> 2.  git-p4 starts to import a second branch and creates a fast-import
>     "progress checkpoint". This triggers:
> 
>     --> fast-import.c: cycle_packfile
>     --> fast-import.c: end_packfile
>     --> fast-import.c: loosen_small_pack
> 
>     At this point we have no packfile anymore.
> 
> 3.  git-p4 sends the command "commit refs/remotes/p4/depot/branch2"
>     to fast-import in order to create a new branch. This triggers:
> 
>     --> fast-import.c: parse_new_commit
>     --> fast-import.c: load_branch
>     --> fast-import.c: load_tree
> 
>     "load_tree" calls "find_object" and the result has a "pack_id" of 0.
>     I think this indicates that the object is in the packfile. Shouldn't
>     the "pack_id" be "MAX_PACK_ID" instead?
> 
>         myoe = find_object(sha1);
>         if (myoe && myoe->pack_id != MAX_PACK_ID) {

Thanks for the analysis. I think this is definitely the problem.  After
fast-import finalizes a packfile (either at the end of the program or
due to a checkpoint), it never discards its internal mapping of objects
to pack locations. It _has_ to keep such a mapping before the pack is
finalized, because git's regular object database doesn't know about it.
But afterwards, it should be able to rely on the object database.

Retaining the mapping at the end of the program is obviously OK because
we won't make any more lookups in it.

Retaining it at a checkpoint when we keep the packfile is OK, because we
can (usually) still access that packfile (the exception would be if
somebody runs "git repack" while fast-import is running).

But obviously a checkpoint where we throw away the pack needs to clear
that mapping.

The patch below probably makes your case work, but there are a lot of
open questions:

  1. Should we always discard the mapping, even when not loosening
     objects? I can't think of a real downside to always using git's
     object lookups.

  2. Can we free memory associated with object_entry structs at this
     point? They won't be accessible via the hash, but might other bits
     of the code have kept pointers to them?

     I suspect it may screw up the statistics that fast-import prints at
     the end, but that's a minor point.

  3. I notice that a few other structs (branch and tag) hold onto the
     pack_id, which will now point to a pack we can't access. Does this
     matter? I don't think so, because checkpoint() seems to dump the
     branches and tags.

  4. In general, should we be loosening objects at checkpoints at all?

     I guess it is probably more efficient than creating a bunch of
     little packs. And it should probably not happen much at all outside
     of tests (i.e., callers should generally checkpoint after an
     appreciable amount of work is done).

diff --git a/fast-import.c b/fast-import.c
index 0e8bc6a..9bfbfb0 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -597,6 +597,22 @@ static struct object_entry *insert_object(unsigned char *sha1)
 	return e;
 }
 
+static void clear_object_table(void)
+{
+	unsigned int h;
+	for (h = 0; h < ARRAY_SIZE(object_table); h++) {
+		/*
+		 * We can't individually free objects here
+		 * because they are allocated from a pool.
+		 */
+		object_table[h] = NULL;
+	}
+	/*
+	 * XXX maybe free object_entry_pool here,
+	 * or might something still be referencing them?
+	 */
+}
+
 static unsigned int hc_str(const char *s, size_t len)
 {
 	unsigned int r = 0;
@@ -1035,6 +1051,9 @@ discard_pack:
 	pack_data = NULL;
 	running = 0;
 
+	/* The objects are now available via git's regular lookups. */
+	clear_object_table();
+
 	/* We can't carry a delta across packfiles. */
 	strbuf_release(&last_blob.data);
 	last_blob.offset = 0;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]