Re: [RFC][PATCH] Allow transfer of any valid sha1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <junkio@xxxxxxx> writes:

> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:
>
>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index a3bcad0..c767d84 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -260,6 +260,27 @@ static void mark_recent_complete_commits
>>  	}
>>  }
>>  
>> +static struct ref **get_sha1_heads(struct ref **refs, int nr_heads, char
> **head)
>> +{
>> +	int i;
>> +	for (i  = 0; i < nr_heads; i++) {
>> +		struct ref *ref;
>> +		unsigned char sha1[20];
>> +		char *s = head[i];
>> +		int len = strlen(s);
>> +
>> +		if (len != 40 || get_sha1_hex(s, sha1))
>> +			continue;
>
> So the new convention is fetch-pack can take ref name (as
> before), or a bare 40-byte hexadecimal.  I think sane people
> would not use ambiguous refname that says "deadbeef" five times,
> and even if the do so they could disambiguate by explicitly
> saying "refs/heads/" followed by "deadbeef" five times, so it
> should be OK.

Yes.

>> +
>> +		ref = xcalloc(1, sizeof(*ref) + len + 1);
>> +		memcpy(ref->old_sha1, sha1, 20);
>> +		memcpy(ref->name, s, len + 1);
>> +		*refs = ref;
>> +		refs = &ref->next;
>> +	}
>> +	return refs;
>> +}
>> +
>
> This function takes the pointer to a location that holds a
> pointer to a "struct ref" -- it is the location to store the
> newly allocated ref structure, i.e. the next pointer of the last
> element in the list.  When it returns, the location pointed at
> by the pointer given to you points at the first element you
> allocated, and it returns the next pointer of the last element
> allocated by it.  That is the same calling convention as
> connect.c::get_remote_heads().  So when calling this function to
> append to a list you already have, you would give the next
> pointer to the last element of the existing list.  But you do
> not seem to do that.

Ack. That does look like a bug.  I knew there as something
fishy about that code.  But it worked for my basic testing so I didn't
worry about it.

> I think the body of fetch_pack() should become something like:
>
> 	struct ref *ref, **tail;
>
>         tail = get_remote_heads(fd[0], &ref, 0, NULL, 0);
> 	if (server_supports("multi_ack")) {
> 		...
> 	}
> 	tail = get_sha1_heads(tail, nr_match, match);
> 	if (everything_local(&ref, nr_match, match)) {
> 		...

Actually because we want the filter to resolve sha1s by
default in terms of what was passed on the command line.  I'm pretty
certain that should be:

	tail = get_sha1_heads(&ref, nr_match, match);
	tail = get_remote_heads(fd[0], tail, 0, NULL, 0);
        ...


>> @@ -311,6 +332,8 @@ static int everything_local(struct ref *
>>  	if (cutoff)
>>  		mark_recent_complete_commits(cutoff);
>>  
>> +	filter_refs(refs, nr_match, match);
>> +
>
> I am not sure about this change.

Agreed.  It was a hold over from an earlier way of injecting
the sha1 into the logic.  

As for what happens I think I need to audit everything that
takes a ref from fetch_pack.  To make certain I have not
messed up the logic.

> In the original code we do not let get_remote_heads() to filter
> the refs but call filter_refs() after the "mark all complete
> remote refs as common" step for a reason.  Even though we may
> not be fetching from some remote refs, we would want to take
> advantage of the knowledge of what objects they have so that we
> can mark as many objects as common as possible in the early
> stage.  I suspect this change defeats that optimization.

It feels like it.

> So instead I would teach "mark all complete remote refs" loop
> that not everything in refs list is a valid remote ref, and skip
> what get_sha1_heads() injected, because these arbitrary ones we
> got from the command line are not something we know exist on the
> remote side.  Maybe something like this.

Sounds sane.  We also introduce a new possibility of having a
ref that is complete but not remote.

> 	/*
> 	 * Mark all complete remote refs as common refs.
> 	 * Don't mark them common yet; the server has to be told so first.
> 	 */
> 	for (ref = *refs; ref; ref = ref->next) {
> 		struct object *o;
>                 if (ref is SHA1 from the command line)
>                 	continue;
> 		o = deref_tag(lookup_object(ref->old_sha1), NULL, 0);
> 		if (!o || o->type != commit_type || !(o->flags & COMPLETE))
> 			continue;
> 		...
>
> To implement "ref is SHA1 from the command line", I would add
> another 1-bit field to "struct ref" and mark the new ones you
> create in get_sha1_heads() as such (existing "force" field
> could also become an 1-bit field -- we do not neeed a char).

Sounds sane.
So that gives me:
	unsigned int force : 1;
	unsigned int injected : 1;

Which aligns them to an int boundary but since we are followed
immediately by a pointer should result in no additional storage being
consumed.

>> @@ -373,6 +394,7 @@ static int fetch_pack(int fd[2], int nr_
>>  		packet_flush(fd[1]);
>>  		die("no matching remote head");
>>  	}
>> +	get_sha1_heads(&ref, nr_match, match);
>
> I talked about this one already...
>
>> diff --git a/git-parse-remote.sh b/git-parse-remote.sh
>> index 187f088..2372df8 100755
>> --- a/git-parse-remote.sh
>> +++ b/git-parse-remote.sh
>> @@ -105,6 +105,7 @@ canon_refs_list_for_fetch () {
>>  		'') remote=HEAD ;;
>>  		refs/heads/* | refs/tags/* | refs/remotes/*) ;;
>>  		heads/* | tags/* | remotes/* ) remote="refs/$remote" ;;
>> +
> [0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])
> ;;
>
> Yuck.  Don't we have $_x40 somewhere?

I couldn't find one in shell.  

> We never use uppercase so at least we could save 24 columns from
> here ;-).

I'm not certain why we always add make $remote="refs/heads/$remote" by
default in that switch statement. git-fetch-pack at least doesn't need
it.

If that is true of the other consumers we could easily make the test:
[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]*) ;;
Or even simply make the default case *) ;;

But for the moment I will stick to the long form because it is
obviously correct.

Eric
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]