Re: [PATCH 2/3] protocol v2: specify static seeding of clone/fetch via "bundle-uri"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/26/2021 11:00 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Oct 26 2021, Derrick Stolee wrote:
>> The implementation seems simple enough, which I like. I'm a bit
> 
> I mentally inserted the missing "skeptical/uncertain" etc. here :)

More "uncertain" than "skeptical". The current plan works perfectly
for the current implementation, so there is an element of YAGNI that
could easily lead us to avoid overthinking this.
 
>> your current use of Git config as the back-end, only because it is
>> difficult to be future-proof. As the functionality stands today, the
>> current config design works just fine. Perhaps we don't need to
>> worry about the future, because we can design a new, complementary
>> storage for that extra data. It seems worth exploring for a little
>> while, though. Perhaps we should take a page out of 'git-remote'
>> and how it stores named remotes with sub-items for metadata. The
>> names probably don't need to ever be exposed to users, but it could
>> be beneficial to anyone implementing this scheme.
>>
>> [bundle "main"]
>> 	uri = https://example.com/my-bundle
>> 	uri = https://redundant-cdn.com/my-bundle
>> 	size = 120523
>> 	sha256 = {64hexchars}
>>
>> [bundle "fork"]
>> 	uri = https://cdn.org/my-fork
>> 	size = 334
>> 	sha256 = {...}
>> 	prereq = {oid}
>>
>> This kind of layout has an immediate grouping of data that should
>> help any future plan. Notice how I included multiple "uri" lines
>> in the "main", which helps with your plan for duplicate URIs.
> 
> At first sight I like that config schema much better than my current
> one, in particular how it makes the future-proofed "these N urls are one
> logical URL" case simpler.
> 
> But overall I'm a bit on the fence, and leaning towards keeping what I
> have, not out of any lazynes or wanting to just keep what I have mind
> you.
> 
> But rather that the main benefit of the current one is that it's a 1=1
> mapping to the line-based protocol, and you can say update your URLs as:
> 
>     git config --replace-all uploadpack.bundleUri "$first_url" &&
>     git config --add uploadpack.bundleUri "$second_url"
> 
> Having usually you'd know the URL you'd like to replace, so you can use
> the [value-pattern] of --replace-all, if it's a named section or other
> split-out structure that become a two-step lookup.

Don't forget to use --fixed-value for exact string matching instead of
regex matching!

> Also for testing I've got a (trivial) plumbing tool I'll submit called
> "git ls-remote-bundle-uri" (could be folded into something else I guess)
> to dump the server-side config, it's nice that you can pretty much
> directly copy/paste it into config without needing to adjust it.

With the appropriate helper structs and methods in the product code,
such helper tools will still be simple without being a second place
that is directly aware of how the values are stored to disk. I don't
judge your prototype work that helps you build the feature, but it's
simultaneously not a reason to stick to a design.

> Having said all that I'm not sure I feel strongly about it either way,
> what do you think given the above?

I'm not feeling too strong about it right now. The current design
does not need anything extra, but it also purposefully leaves certain
things open for extension in the future.

The thing I worry about is that there will be two supported ways to
store a list of bundle URIs: a flat list of URIs in the multi-valued
uploadPack.bundleURI config value, but then also a second option that
allows the extensions that arise. It's a layer of complication that
would be nice to avoid if there was an easy way to do it, but the
schema I sketched earlier isn't simple enough to merit a switch right
now. Perhaps someone else will have an idea that accomplishes the
same goal, but also is less complicated?
 
> I think most "real" server operators will use this as
> GIT_CONFIG_COUNT=<n> GIT_CONFIG_{KEY,VALUE}_<1..n>, which can of course
> work with any config schema, but if you've got code generating it on the
> other side naming the targets is probably a slight hassle / confusing.

You'd really overload the environment this way? That's not how I would
approach it, but maybe there is a benefit over writing to the repository's
config file. I suppose that you could store the data in a database and
link it to the repository at runtime instead.

> There's also the small matter of it being consistent with the
> packfile-uri config in its current form, but that shouldn't be a reason
> not to come up with something better. If anything any better suggestion
> (if we go for that) could be supported by it too...

What do you mean about being consistent with packfile-uri? This layer
that we care about isn't even implemented in git.git.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux