Re: [PATCH 2/3] protocol v2: specify static seeding of clone/fetch via "bundle-uri"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 26 2021, Derrick Stolee wrote:

> On 10/25/2021 5:25 PM, Ævar Arnfjörð Bjarmason wrote:
>> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
>> +bundle-uri CLIENT ERROR RECOVERY
>> +++++++++++++++++++++++++++++++++
>> +
>> +A client MUST above all gracefully degrade on errors, whether that
>> +error is because of bad missing/data in the bundle URI(s), because
>> +that client is too dumb to e.g. understand and fully parse out bundle
>> +headers and their prerequisite relationships, or something else.
>
> "too dumb" seems a bit informal to me, especially because you
> immediately elaborate on its meaning. You could rewrite as follows:
>
>   ...because
>   that client can't understand or fully parse out bundle
>   headers and their prerequisite relationships, or something else.

Thanks, I've snipped all your subsequent comments on
phrasing/clarifications etc, except insofar as they have questions I
need to address (as opposed to just my bad grammar/phrasing etc).

Thanks a lot for them, will go through them closely for any subsequent
re-roll & address them.

> [...]
>> +While no per-URI key-value are currently supported currently they're
>> +intended to support future features such as:
>> +
>> + * Add a "hash=<val>" or "size=<bytes>" advertise the expected hash or
>> +   size of the bundle file.
>
> I suppose if one wanted to add this server-to-bundle coupling, then some
> clients might appreciate it.

For packfile-uri there's a hard dependency on the server transferring
the hash of the PACK file.

I've intentionally omitted it, the reasons are covered in [1], which I
realize now should really be part of this early series.

Basically having it as a hard requirement isn't necessary for security
or payload validation. Any server who's worried about their transport
integrity would point to a https URI under their control, any
checksumming and validation we'll need we'll get from the transport
layer and the client's reachability check.

Having it would mean that you need closer cooperation by default between
server and CDN than I'm aiming for, i.e. a server should be able to
point to some URI somewhere updated by a dumb hourly cronjob, without
needing to pass information back & forth about what the "current" URL
is. The client will discover all that.

But I left that "hash=*" in because it could be optionally added, in
case someone really wants it for some reason...

1. https://lore.kernel.org/git/RFC-patch-13.13-1e657ed27a-20210805T150534Z-avarab@xxxxxxxxx/

>> + * Advertise that one or more bundle files are the same (to e.g. have
>> +   clients round-robin or otherwise choose one of N possible files).
>
>   * Advertise that one or more bundle files are the same, to allow for
>     redundancy without causing duplicated effort.

*nod*

>> +static void send_bundle_uris(struct packet_writer *writer,
>> +			     struct string_list *uris)
>> +{
>> +	struct string_list_item *item;
>> +
>> +	for_each_string_list_item(item, uris)
>> +		packet_writer_write(writer, "%s", item->string);
>> +}
>> +
>> +static int advertise_bundle_uri = -1;
>> +static struct string_list bundle_uris = STRING_LIST_INIT_DUP;
>
> I see you put send_bundle_uris() before the global bundle_uris so
> it can be independent, but do you expect anyone to call send_bundle_uris()
> via a different list?

No, I'll move that around or rather fold it into bundle_uri_command()
directly.

I think I'd originally copied the structure of send_ref() and ls_refs()
from ls-refs.c, but it doesn't make much sense anymore here for this
2-line function. Thanks.

> Should we find a different place to store this data?
>
>> +static int bundle_uri_config(const char *var, const char *value, void *data)
>> +{
>> +	if (!strcmp(var, "uploadpack.bundleuri")) {
>> +		advertise_bundle_uri = 1;
>> +		string_list_append(&bundle_uris, value);
>> +	}
>> +
>> +	return 0;
>> +}
>
> Here, we are dictating that the URI list is available as a multi-valued
> config "uploadpack.bundleuri".
>
> 1. Should this be updated in Documentation/config/uploadpack.txt?

Definitely. I'll either incorporate that or re-structure this leading
series so that it's more design-doc/protocol focused, in any case all of
this ends up documented in the right places eventually...

> 2. This seems difficult to extend to your possible future features as
>    listed in the protocol docs, mainly because this can only store the
>    flat URI string. To add things like hash values, sizes, and prereqs,
>    you would need more data included and grouped on a per-URI basis.
>    What plans do you have to make extensions here while remaining
>    somewhat compatible with downgrading Git versions?

[...addressed below...]

>> @@ -136,6 +137,11 @@ static struct protocol_capability capabilities[] = {
>>  		.advertise = always_advertise,
>>  		.command = cap_object_info,
>>  	},
>> +	{
>> +		.name = "bundle-uri",
>> +		.advertise = bundle_uri_advertise,
>> +		.command = bundle_uri_command,
>> +	},
>>  };
>
> I really appreciate that it is this simple to extend protocol v2.

Yeah! FWIW I've got some WIP patches to make it easier still, i.e. some
further simplification & validation in the serve.[ch] API.

>> +test_expect_success 'basics of bundle-uri: dies if not enabled' '
>> +	test-tool pkt-line pack >in <<-EOF &&
>> +	command=bundle-uri
>> +	0000
>> +	EOF
>> +
>> +	cat >err.expect <<-\EOF &&
>> +	fatal: invalid command '"'"'bundle-uri'"'"'
>> +	EOF
>> +
>> +	cat >expect <<-\EOF &&
>> +	ERR serve: invalid command '"'"'bundle-uri'"'"'
>> +	EOF
>> +
>> +	test_must_fail test-tool serve-v2 --stateless-rpc <in >out 2>err.actual &&
>> +	test_cmp err.expect err.actual &&
>> +	test_must_be_empty out
>> +'
>> +
>> +
>
> hyper-nit: double newline.
>
> The implementation seems simple enough, which I like. I'm a bit

I mentally inserted the missing "skeptical/uncertain" etc. here :)

> your current use of Git config as the back-end, only because it is
> difficult to be future-proof. As the functionality stands today, the
> current config design works just fine. Perhaps we don't need to
> worry about the future, because we can design a new, complementary
> storage for that extra data. It seems worth exploring for a little
> while, though. Perhaps we should take a page out of 'git-remote'
> and how it stores named remotes with sub-items for metadata. The
> names probably don't need to ever be exposed to users, but it could
> be beneficial to anyone implementing this scheme.
>
> [bundle "main"]
> 	uri = https://example.com/my-bundle
> 	uri = https://redundant-cdn.com/my-bundle
> 	size = 120523
> 	sha256 = {64hexchars}
>
> [bundle "fork"]
> 	uri = https://cdn.org/my-fork
> 	size = 334
> 	sha256 = {...}
> 	prereq = {oid}
>
> This kind of layout has an immediate grouping of data that should
> help any future plan. Notice how I included multiple "uri" lines
> in the "main", which helps with your plan for duplicate URIs.

At first sight I like that config schema much better than my current
one, in particular how it makes the future-proofed "these N urls are one
logical URL" case simpler.

But overall I'm a bit on the fence, and leaning towards keeping what I
have, not out of any lazynes or wanting to just keep what I have mind
you.

But rather that the main benefit of the current one is that it's a 1=1
mapping to the line-based protocol, and you can say update your URLs as:

    git config --replace-all uploadpack.bundleUri "$first_url" &&
    git config --add uploadpack.bundleUri "$second_url"

Having usually you'd know the URL you'd like to replace, so you can use
the [value-pattern] of --replace-all, if it's a named section or other
split-out structure that become a two-step lookup.

Also for testing I've got a (trivial) plumbing tool I'll submit called
"git ls-remote-bundle-uri" (could be folded into something else I guess)
to dump the server-side config, it's nice that you can pretty much
directly copy/paste it into config without needing to adjust it.

Having said all that I'm not sure I feel strongly about it either way,
what do you think given the above?

I think most "real" server operators will use this as
GIT_CONFIG_COUNT=<n> GIT_CONFIG_{KEY,VALUE}_<1..n>, which can of course
work with any config schema, but if you've got code generating it on the
other side naming the targets is probably a slight hassle / confusing.

There's also the small matter of it being consistent with the
packfile-uri config in its current form, but that shouldn't be a reason
not to come up with something better. If anything any better suggestion
(if we go for that) could be supported by it too...




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux