Re: Alternates and push

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> Theodore Tso <tytso@xxxxxxx> writes:
>
>> One way that wouldn't break backwards compatibility would be to use
>> $D/refs if it exists, but if it isn't, fall back to existing behavior
>> (which is to say, only use the refs in the repository, not in the
>> borrowed repository/object store).  Is there a reason why this would
>> be problematic?
>
> This does not answer "is this safe enough change?" question, but the code
> to implement this should look like this (Area expert Daniel CC'ed).

Oops, I forgot to Cc Daniel.  Shawn also CC'ed because he will have the
same issue with the smart CGI based http-push he is building.

> +static int add_refs_from_alternate(struct alternate_object_database *e, void *unused)
> +{
> +	char *other = xstrdup(make_absolute_path(e->base));
> +...
> +	remote = remote_get(other);
> +	transport = transport_get(remote, other);
> +	for (extra = transport_get_remote_refs(transport);
> +	     extra;
> +	     extra = extra->next) {
> +		/*
> +		 * Make sure this is named uniquely but with invalid name, so
> +		 * that we can catch the other end trying to do anything funky
> +		 * with it.
> +		 */
> +		strbuf_reset(&buf);
> +		strbuf_addf(&buf, "refs/borrowed-ref/%s", extra->name);
> +		add_extra_ref(buf.buf, extra->old_sha1, 0);
> +	}
> +	transport_disconnect(transport);
> +	free(other);
> +	return 0;
> +}

I need to clarify the comment in the above function.

When you are running "git push", the protocol employed is the send-pack
protocol, which begins by the receiving end advertising what refs it has
and where they point at.  The above code is in the receiving end and adds
extra refs from alternate repositories to this advertisement.

The advertisement serves two different purposes.  One is the immediate
issue we are tackling --- to tell the sending end what objects are not
needed in the pack it sends.  Objects that are reachable from the objects
in this advertisement do not have to be sent.  When the sending end wants
to send its 'master', for example, it uses this information to compute the
set of objects by doing (roughly):

	git rev-list --objects master --not $sha1_1 $sha1_2 ...

where $sha1_N are the objects we advertise here.  The more we assure the
sender we have, the less it has to send to us.

But there is another purpose the sender uses this information for.  It is
used to determine which refs to push (when "git push" is run without any
explicit refspecs, it does "matching refs" -- it learns the set of refs
the receiving end has from this advertisement, and updates the refs of the
same name the sending end also has), and which branches on the receiving
end should be removed (when "git push --mirror" is run, it will send NULL
updates to refs the receiving end has but the sender doesn't).

We do want to show the extra object names to the sender so that it can
exclude more objects from the transfer, but for the latter purpose, we
really do not want these phoney refs to be understood as refs we have by
the sending end.  In the original version of my patch, I actually had
"refs/borrowed*refs%s/%s" as the format string (notice the asterisk) to
make it an invalid refname, and (notice the extra %s) to make them unique
by including the full-pathname to the repository as part of this string,
so that we can pretend we have different 'master' from two alternate
repositories.

The sending end, however, has a safety to silently ignore malformed refs
it learns from the receiving end over the wire.  So the current sender
won't work with asterisk in there '*', nor full-pathname, because
typically it contains "/.git/" in the string, which would make it an
invalid refname to be ignored.

Also, the sending end remembers the objects we have per refname, so
sending two records with the same refname pointing at two different
objects will not work well either (the current code is loose and does not
check duplicates, but that is not a feature by design but is an accident).
If the receiving repository borrows from two alternate repositories, both
of which have 'master' branch, we would be sending two records, each of
which claims that it is "refs/borrowed-ref/refs/heads/master" but pointing
at a different commit.

That is the "unique and invalid" comment above is about (iow, the above
hunk that does not send "unique and invalid" refs cannot be the final form
of this enhancement).

What this means is that we also need to update the sending side in order
to enable this enhancement.  connect.c::get_remote_heads() is the relevant
code and we would probably need to add an option to keep all refs (and
invalid refs) in the returned list.  Both remote.c::match_refs() and
builtin-send-pack.c::do_send_pack() need to be taught to ignore the
invalid refs in the list, while builtin-send-pack.c::pack_objects() should
utilize all the refs (and invalid ones) when excluding the objects the
receiving end claims to have.  While at it, we probably should declare
that sending duplicate invalid refs is not an error (and we can use
something like "*borrowed*" as the phoney refname --- we do not need the
uniqueness for them, nor we need to tell the other end what ref we are
borrowing from whom).

We can start sending invalid refs (just replace - with * in the patch I
sent) from the receiving end without waiting for the sender end program to
get updated.  The sending end will ignore it and nothing (other than the
extra startup overhead the additional code has) bad should happen.  When
the sender is updated to keep the invalid refs, it will start to take
notice and your push will suddenly get smaller.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux