Re: [PATCH v3 0/2] bundle URIs: design doc and initial git fetch --bundle-uri implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022.07.25 13:53, Derrick Stolee via GitGitGadget wrote:
> This is the first of series towards building the bundle URI feature as
> discussed in previous RFCs, specifically pulled directly out of [5]:
> 
> [1]
> https://lore.kernel.org/git/RFC-cover-00.13-0000000000-20210805T150534Z-avarab@xxxxxxxxx/
> [2]
> https://lore.kernel.org/git/cover-0.3-00000000000-20211025T211159Z-avarab@xxxxxxxxx/
> [3]
> https://lore.kernel.org/git/pull.1160.git.1645641063.gitgitgadget@xxxxxxxxx
> [4]
> https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@xxxxxxxxx/
> [5]
> https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@xxxxxxxxx
> 
> THIS ONLY INCLUDES THE DESIGN DOCUMENT. See "Updates in v3". There are two
> patches:
> 
>  1. The main design document that details the bundle URI standard and how
>     the client interacts with the bundle data.
>  2. An addendum to the design document that details one strategy for
>     organizing bundles from the perspective of a bundle provider.
> 
> As outlined in [5], the next steps after this are:
> 
>  1. Add 'git clone --bundle-uri=' to run a 'git bundle fetch ' step before
>     doing a fetch negotiation with the origin remote. [6]
>  2. Allow parsing a bundle list as a config file at the given URI. The
>     key-value format is unified with the protocol v2 verb (coming in (3)).
>     [7]
>  3. Implement the protocol v2 verb, re-using the bundle list logic from (2).
>     Use this to auto-discover bundle URIs during 'git clone' (behind a
>     config option). [8]
>  4. Implement the 'creationToken' heuristic, allowing incremental 'git
>     fetch' commands to download a bundle list from a configured URI, and
>     only download bundles that are new based on the creation token values.
>     [9]
> 
> I have prepared some of this work as pull requests on my personal fork so
> curious readers can look ahead to where we are going:
> 
> [6] https://github.com/derrickstolee/git/pull/18 [7]
> https://github.com/derrickstolee/git/pull/20 [8]
> https://github.com/derrickstolee/git/pull/21 [9]
> https://github.com/derrickstolee/git/pull/22
> 
> As mentioned in the design document, this is not all that is possible. For
> instance, Ævar's suggestion to download only the bundle headers can be used
> as a second heuristic (and as an augmentation of the timestamp heuristic).
> 
> 
> Updates in v3
> =============
> 
>  * This version only includes the design document. Thanks to all the
>    reviewers for the significant attention that improves the doc a lot.
>  * The second patch has an addition to the design document that details a
>    potential way to organize bundles from the provider's perspective.
>  * Based on some off-list feedback, I was going to switch git fetch
>    --bundle-uri into git bundle fetch, but that has a major conflict with
>    [10] which was just submitted.
>  * I will move the git bundle fetch implementation into [6] which also has
>    the git clone --bundle-uri implementation. [10]
>    https://lore.kernel.org/git/20220725123857.2773963-1-szeder.dev@xxxxxxxxx/
> 
> 
> Updates in v2
> =============
> 
>  * The design document has been updated based on Junio's feedback.
>  * The "bundle.list." keys are now just "bundle.".
>  * The "timestamp" heuristic is now "creationToken".
>  * More clarity on how Git parses data from the bundle URI.
>  * Dropped some unnecessary bundle list keys (*.list, *.requires).
> 
> Thanks, -Stolee
> 
> Derrick Stolee (2):
>   docs: document bundle URI standard
>   bundle-uri: add example bundle organization
> 
>  Documentation/Makefile                 |   1 +
>  Documentation/technical/bundle-uri.txt | 573 +++++++++++++++++++++++++
>  2 files changed, 574 insertions(+)
>  create mode 100644 Documentation/technical/bundle-uri.txt
> 
> 
> base-commit: e72d93e88cb20b06e88e6e7d81bd1dc4effe453f
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1248%2Fderrickstolee%2Fbundle-redo%2Ffetch-v3
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1248/derrickstolee/bundle-redo/fetch-v3
> Pull-Request: https://github.com/gitgitgadget/git/pull/1248
> 
> Range-diff vs v2:
> 
>  1:  d444042dc4d ! 1:  e0f003e1b5f docs: document bundle URI standard
>      @@ Commit message
>       
>           Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx>
>       
>      + ## Documentation/Makefile ##
>      +@@ Documentation/Makefile: TECH_DOCS += SubmittingPatches
>      + TECH_DOCS += ToolsForGit
>      + TECH_DOCS += technical/bitmap-format
>      + TECH_DOCS += technical/bundle-format
>      ++TECH_DOCS += technical/bundle-uri
>      + TECH_DOCS += technical/cruft-packs
>      + TECH_DOCS += technical/hash-function-transition
>      + TECH_DOCS += technical/http-protocol
>      +
>        ## Documentation/technical/bundle-uri.txt (new) ##
>       @@
>       +Bundle URIs
>       +===========
>       +
>      ++Git bundles are files that store a pack-file along with some extra metadata,
>      ++including a set of refs and a (possibly empty) set of necessary commits. See
>      ++linkgit:git-bundle[1] and link:bundle-format.txt[the bundle format] for more
>      ++information.
>      ++
>       +Bundle URIs are locations where Git can download one or more bundles in
>       +order to bootstrap the object database in advance of fetching the remaining
>       +objects from a remote.
>      @@ Documentation/technical/bundle-uri.txt (new)
>       +	If this string-valued key exists, then the bundle list is designed to
>       +	work well with incremental `git fetch` commands. The heuristic signals
>       +	that there are additional keys available for each bundle that help
>      -+	determine which subset of bundles the client should download.
>      ++	determine which subset of bundles the client should download. The only
>      ++  heuristic currently planned is `creationToken`.
>       +
>       +The remaining keys include an `<id>` segment which is a server-designated
>      -+name for each available bundle.
>      ++name for each available bundle. The `<id>` must contain only alphanumeric
>      ++and `-` characters.
>       +
>       +bundle.<id>.uri::
>       +	(Required) This string value is the URI for downloading bundle `<id>`.
>      @@ Documentation/technical/bundle-uri.txt (new)
>       +
>       +Here is an example bundle list using the Git config format:
>       +
>      -+```
>      -+[bundle]
>      -+	version = 1
>      -+	mode = all
>      -+	heuristic = creationToken
>      ++	[bundle]
>      ++		version = 1
>      ++		mode = all
>      ++		heuristic = creationToken
>       +
>      -+[bundle "2022-02-09-1644442601-daily"]
>      -+	uri = https://bundles.example.com/git/git/2022-02-09-1644442601-daily.bundle
>      -+	timestamp = 1644442601
>      ++	[bundle "2022-02-09-1644442601-daily"]
>      ++		uri = https://bundles.example.com/git/git/2022-02-09-1644442601-daily.bundle
>      ++		creationToken = 1644442601
>       +
>      -+[bundle "2022-02-02-1643842562"]
>      -+	uri = https://bundles.example.com/git/git/2022-02-02-1643842562.bundle
>      -+	timestamp = 1643842562
>      ++	[bundle "2022-02-02-1643842562"]
>      ++		uri = https://bundles.example.com/git/git/2022-02-02-1643842562.bundle
>      ++		creationToken = 1643842562
>       +
>      -+[bundle "2022-02-09-1644442631-daily-blobless"]
>      -+	uri = 2022-02-09-1644442631-daily-blobless.bundle
>      -+	timestamp = 1644442631
>      -+	filter = blob:none
>      ++	[bundle "2022-02-09-1644442631-daily-blobless"]
>      ++		uri = 2022-02-09-1644442631-daily-blobless.bundle
>      ++		creationToken = 1644442631
>      ++		filter = blob:none
>       +
>      -+[bundle "2022-02-02-1643842568-blobless"]
>      -+	uri = /git/git/2022-02-02-1643842568-blobless.bundle
>      -+	timestamp = 1643842568
>      -+	filter = blob:none
>      -+```
>      ++	[bundle "2022-02-02-1643842568-blobless"]
>      ++		uri = /git/git/2022-02-02-1643842568-blobless.bundle
>      ++		creationToken = 1643842568
>      ++		filter = blob:none
>       +
>       +This example uses `bundle.mode=all` as well as the
>       +`bundle.<id>.creationToken` heuristic. It also uses the `bundle.<id>.filter`
>      @@ Documentation/technical/bundle-uri.txt (new)
>       +* The client fails to connect with a server at the given URI or a connection
>       +  is lost without any chance to recover.
>       +
>      -+* The client receives a response other than `200 OK` (such as `404 Not Found`,
>      -+  `401 Not Authorized`, or `500 Internal Server Error`). The client should
>      -+  use the `credential.helper` to attempt authentication after the first
>      -+  `401 Not Authorized` response, but a second such response is a failure.
>      ++* The client receives a 400-level response (such as `404 Not Found` or
>      ++  `401 Not Authorized`). The client should use the credential helper to
>      ++  find and provide a credential for the URI, but match the semantics of
>      ++  Git's other HTTP protocols in terms of handling specific 400-level
>      ++  errors.
>       +
>      -+* The client receives data that is not parsable as a bundle or bundle list.
>      ++* The server reports any other failure reponse.
>       +
>      -+* The bundle list describes a directed cycle in the
>      -+  `bundle.<id>.requires` links.
>      ++* The client receives data that is not parsable as a bundle or bundle list.
>       +
>       +* A bundle includes a filter that does not match expectations.
>       +
>       +* The client cannot unbundle the bundles because the prerequisite commit OIDs
>      -+  are not in the object database and there are no more
>      -+  `bundle.<id>.requires` links to follow.
>      ++  are not in the object database and there are no more bundles to download.
>       +
>       +There are also situations that could be seen as wasteful, but are not
>       +error conditions:
>      @@ Documentation/technical/bundle-uri.txt (new)
>       +  the client is using hourly prefetches with background maintenance, but
>       +  the server is computing bundles weekly. For this reason, the client
>       +  should not use bundle URIs for fetch unless the server has explicitly
>      -+  recommended it through the `bundle.flags = forFetch` value.
>      ++  recommended it through a `bundle.heuristic` value.
>       +
>       +Implementation Plan
>       +-------------------
>      @@ Documentation/technical/bundle-uri.txt (new)
>       +   that the config format parsing feeds a list of key-value pairs into the
>       +   bundle list logic.
>       +
>      -+3. Create the `bundle-uri` protocol v2 verb so Git servers can advertise
>      ++3. Create the `bundle-uri` protocol v2 command so Git servers can advertise
>       +   bundle URIs using the key-value pairs. Plug into the existing key-value
>       +   input to the bundle list logic. Allow `git clone` to discover these
>       +   bundle URIs and bootstrap the client repository from the bundle data.
>  2:  0a2cf60437f < -:  ----------- remote-curl: add 'get' capability
>  3:  abec47564fd < -:  ----------- bundle-uri: create basic file-copy logic
>  4:  f6255ec5188 < -:  ----------- fetch: add --bundle-uri option
>  5:  bfbd11b48bf < -:  ----------- bundle-uri: add support for http(s):// and file://
>  6:  a217e9a0640 < -:  ----------- fetch: add 'refs/bundle/' to log.excludeDecoration
>  -:  ----------- > 2:  a933471c3af bundle-uri: add example bundle organization
> 
> -- 
> gitgitgadget

Looks good to me, thanks for the series!

Reviewed-by: Josh Steadmon <steadmon@xxxxxxxxxx>



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux