Re: [BUG] t9801 and t9803 broken on next

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17 May 2016, at 14:13, Jeff King <peff@xxxxxxxx> wrote:

> On Tue, May 17, 2016 at 10:07:16AM +0200, Lars Schneider wrote:
> 
>> I think that is pretty much the problem. Here is what is happening:
>> 
>> 1.  git-p4 imports all changelists for the "main" branch
>> 
>> 2.  git-p4 starts to import a second branch and creates a fast-import
>>    "progress checkpoint". This triggers:
>> 
>>    --> fast-import.c: cycle_packfile
>>    --> fast-import.c: end_packfile
>>    --> fast-import.c: loosen_small_pack
>> 
>>    At this point we have no packfile anymore.
>> 
>> 3.  git-p4 sends the command "commit refs/remotes/p4/depot/branch2"
>>    to fast-import in order to create a new branch. This triggers:
>> 
>>    --> fast-import.c: parse_new_commit
>>    --> fast-import.c: load_branch
>>    --> fast-import.c: load_tree
>> 
>>    "load_tree" calls "find_object" and the result has a "pack_id" of 0.
>>    I think this indicates that the object is in the packfile. Shouldn't
>>    the "pack_id" be "MAX_PACK_ID" instead?
>> 
>>        myoe = find_object(sha1);
>>        if (myoe && myoe->pack_id != MAX_PACK_ID) {
> 
> Thanks for the analysis. I think this is definitely the problem.  After
> fast-import finalizes a packfile (either at the end of the program or
> due to a checkpoint), it never discards its internal mapping of objects
> to pack locations. It _has_ to keep such a mapping before the pack is
> finalized, because git's regular object database doesn't know about it.
> But afterwards, it should be able to rely on the object database.
> 
> Retaining the mapping at the end of the program is obviously OK because
> we won't make any more lookups in it.
> 
> Retaining it at a checkpoint when we keep the packfile is OK, because we
> can (usually) still access that packfile (the exception would be if
> somebody runs "git repack" while fast-import is running).
> 
> But obviously a checkpoint where we throw away the pack needs to clear
> that mapping.
> 
> The patch below probably makes your case work, but there are a lot of
> open questions:
Confirmed. The offending tests pass with your patch.


>  1. Should we always discard the mapping, even when not loosening
>     objects? I can't think of a real downside to always using git's
>     object lookups.
> 
>  2. Can we free memory associated with object_entry structs at this
>     point? They won't be accessible via the hash, but might other bits
>     of the code have kept pointers to them?
> 
>     I suspect it may screw up the statistics that fast-import prints at
>     the end, but that's a minor point.
> 
>  3. I notice that a few other structs (branch and tag) hold onto the
>     pack_id, which will now point to a pack we can't access. Does this
>     matter? I don't think so, because checkpoint() seems to dump the
>     branches and tags.
>  4. In general, should we be loosening objects at checkpoints at all?
> 
>     I guess it is probably more efficient than creating a bunch of
>     little packs. And it should probably not happen much at all outside
>     of tests (i.e., callers should generally checkpoint after an
>     appreciable amount of work is done).
I am not too familiar with the code and therefore I can't give a good
answer to your questions. However, I noticed that Shawn (CC'ed) wrote a 
lot of fast-import code. Maybe he can help?

>From my point of view little packs are no problem. I run fast-import on
a dedicated migration machine. After fast-import completion I run repack [1] 
before I upload the repo to its final location.


Thanks,
Lars

[1] https://gcc.gnu.org/ml/gcc/2007-12/msg00165.html


> 
> diff --git a/fast-import.c b/fast-import.c
> index 0e8bc6a..9bfbfb0 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -597,6 +597,22 @@ static struct object_entry *insert_object(unsigned char *sha1)
> 	return e;
> }
> 
> +static void clear_object_table(void)
> +{
> +	unsigned int h;
> +	for (h = 0; h < ARRAY_SIZE(object_table); h++) {
> +		/*
> +		 * We can't individually free objects here
> +		 * because they are allocated from a pool.
> +		 */
> +		object_table[h] = NULL;
> +	}
> +	/*
> +	 * XXX maybe free object_entry_pool here,
> +	 * or might something still be referencing them?
> +	 */
> +}
> +
> static unsigned int hc_str(const char *s, size_t len)
> {
> 	unsigned int r = 0;
> @@ -1035,6 +1051,9 @@ discard_pack:
> 	pack_data = NULL;
> 	running = 0;
> 
> +	/* The objects are now available via git's regular lookups. */
> +	clear_object_table();
> +
> 	/* We can't carry a delta across packfiles. */
> 	strbuf_release(&last_blob.data);
> 	last_blob.offset = 0;

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]