This is the third series building the bundle URI feature. It is built on top of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is a URI to a bundle file. This series adds the capability of downloading and parsing a bundle list and then downloading the URIs in that list. The core functionality of bundle lists is implemented by creating data structures from a list of key-value pairs. These pairs can come from a plain-text file in Git config format, but in the future, we will support the list being supplied by packet lines over Git's protocol v2 in the 'bundle-uri' command (reserved for the next series). The patches are organized in this way (updated for v4): 1. Patch 1 is a cleanup from the previous part. This allows us to simplify our bundle list data structure slightly. 2. Patches 2-3 create the bundle list data structures and the logic for populating the list from key-value pairs. 3. Patches 4-5 teach Git to parse "key=value" lines to construct a bundle list. Add unit tests that ensure this logic constructs lists correctly. These patches are adapted from Ævar's RFC [1] and were previously seen in my combined RFC [2]. 4. Patch 6 teaches Git to parse Git config files into bundle lists. 5. Patches 7-9 implement the ability to download a bundle list and recursively download the contained bundles (and possibly the bundle lists within). This is limited by a constant depth to avoid issues with cycles or otherwise incorrectly configured bundle lists. This also fixes a previous bug when running verify_bundle() multiple times in the same process, as it did not clear the PREREQ_MARK flag upon leaving (see patch 8). 6. Patches 10-12 suppress unhelpful warnings from user visibility. [1] https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@xxxxxxxxx/ [2] https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@xxxxxxxxx/ At the end of this series, users can bootstrap clones using 'git clone --bundle-uri= ' where points to a bundle list instead of a single bundle file. As outlined in the design document [1], the next steps after this are: 1. Implement the protocol v2 verb, re-using the bundle list logic from (2). Use this to auto-discover bundle URIs during 'git clone' (behind a config option). [2] 2. Implement the 'creationToken' heuristic, allowing incremental 'git fetch' commands to download a bundle list from a configured URI, and only download bundles that are new based on the creation token values. [3] I have prepared some of this work as pull requests on my personal fork so curious readers can look ahead to where we are going: [3] https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@xxxxxxxxx [4] https://github.com/derrickstolee/git/pull/21 [5] https://github.com/derrickstolee/git/pull/22 Updates in v5 ============= * The bug about verify_bundle() not working multile times in the same process is fixed without removing the revision walk. Instead, more flags needed to be removed as the method cleaned up after itself. Updates in v4 ============= * Properly updated the patch outline. * Jonathan Tan asked for more tests, and this revealed some interesting behaviors which I have now either fixed or made explicit: 1. In "all" mode, we try to download and apply all bundles. Do not fail if a single bundle download fails. 2. Previously, not all bundles were being applied, and this was noticed by the added checks for the refs/bundles/* refs at the end of the tests. This revealed the need for removing the reachability walk from verify_bundle() since the written refs/bundles/* refs were not being picked up by the loose ref cache. Since removing the reachability walk seemed like the faster (for users) option, I went that direction. 3. While running those tests and examining the output carefully, I noticed several error messages related to missing prerequisites due to attempting unbundling in a random order. This doesn't appear in the later creationToken version, so I hadn't noticed it at the tip of my local work. These messages are removed with a new quiet mode for verify_bundle(). Updates in v3 ============= * Fixed a comment about a return value of -1. * Fixed and tested scenario where early URIs fail in "any" mode and Git should try the rest of the list. * Instead of using 'success_count' and 'failure_count', use the iterator return value to terminate the "all" mode loop early. Updates in v2 ============= Thank you to all of the voices who chimed in on the previous version. I'm sorry it took so long for me to get a new version. * I've done a rather thorough overhaul to minimize how often later patches rewrite portions of earlier patches. * We no longer use a strbuf in struct remote_bundle_info. Instead, use a 'char *' and only in the patch where it is first used. * The config documentation is more clearly indicating that the bundle.* section has no effect in the repository config (at the moment, which will change in the next series). * The bundle.version value is now parsed using git_parse_int(). * The config key is now parsed using parse_config_key(). * Commit messages clarify more about the context of the change in the bigger picture of the bundle URI effort. * Some printf()s are correctly changed to fprintf()s. * The test helper CLI is unified across the two modes. They both take a filename now. * The count of downloaded bundles is now only updated after a successful download, allowing the "any" mode to keep trying after a failure. Thanks, * Stolee Derrick Stolee (10): bundle-uri: use plain string in find_temp_filename() bundle-uri: create bundle_list struct and helpers bundle-uri: create base key-value pair parsing bundle-uri: parse bundle list in config format bundle-uri: limit recursion depth for bundle lists bundle: properly clear all revision flags bundle-uri: fetch a list of bundles bundle: add flags to verify_bundle() bundle-uri: quiet failed unbundlings bundle-uri: suppress stderr from remote-https Ævar Arnfjörð Bjarmason (2): bundle-uri: create "key=value" line parsing bundle-uri: unit test "key=value" parsing Documentation/config.txt | 2 + Documentation/config/bundle.txt | 24 ++ Makefile | 1 + builtin/bundle.c | 5 +- bundle-uri.c | 458 ++++++++++++++++++++++++++++++-- bundle-uri.h | 93 +++++++ bundle.c | 42 +-- bundle.h | 15 +- config.c | 2 +- config.h | 1 + t/helper/test-bundle-uri.c | 95 +++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t5558-clone-bundle-uri.sh | 275 +++++++++++++++++++ t/t5750-bundle-uri-parse.sh | 171 ++++++++++++ t/test-lib-functions.sh | 11 + transport.c | 2 +- 17 files changed, 1156 insertions(+), 43 deletions(-) create mode 100644 Documentation/config/bundle.txt create mode 100644 t/helper/test-bundle-uri.c create mode 100755 t/t5750-bundle-uri-parse.sh base-commit: e21e663cd1942df29979d3e01f7eacb532727bb7 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1333%2Fderrickstolee%2Fbundle-redo%2Flist-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1333/derrickstolee/bundle-redo/list-v5 Pull-Request: https://github.com/gitgitgadget/git/pull/1333 Range-diff vs v4: 1: 48beccb0f5e = 1: 48beccb0f5e bundle-uri: use plain string in find_temp_filename() 2: f0c4457951c = 2: f0c4457951c bundle-uri: create bundle_list struct and helpers 3: 430e01cd2a4 = 3: 430e01cd2a4 bundle-uri: create base key-value pair parsing 4: cd915d57f3b = 4: cd915d57f3b bundle-uri: create "key=value" line parsing 5: 4d8cac67f66 = 5: 4d8cac67f66 bundle-uri: unit test "key=value" parsing 6: 0ecae3a44b3 = 6: 0ecae3a44b3 bundle-uri: parse bundle list in config format 7: 7e6b32313b0 = 7: 7e6b32313b0 bundle-uri: limit recursion depth for bundle lists -: ----------- > 8: 8dc5a8e4e63 bundle: properly clear all revision flags 9: 6b9c764c6b3 = 9: 51e9b8474fb bundle-uri: fetch a list of bundles 8: 83f2cd893a4 ! 10: fba3a4a117e bundle: add flags to verify_bundle(), skip walk @@ Metadata Author: Derrick Stolee <derrickstolee@xxxxxxxxxx> ## Commit message ## - bundle: add flags to verify_bundle(), skip walk + bundle: add flags to verify_bundle() - The verify_bundle() method checks if a bundle can be applied to a given - repository. This not only verifies that certain commits exist in the - repository, but Git also checks that these commits are reachable. - - This behavior dates back to the original git-bundle builtin written in - 2e0afafebd8 (Add git-bundle: move objects and references by archive, - 2007-02-22), but the message does not go into detail why the - reachability check is important. - - Since verify_bundle() is called from unbundle(), we need to add an - option to pipe the flags through that method. - - When unbundling from a list of bundles, Git will create refs that point - to the tips of the latest bundle, which makes this reachability walk - succeed, in theory. However, the loose refs cache does not get - invalidated and hence the reachability walk fails. By disabling the - reachability walk in the bundle URI code, we can get around this - reachability check. + The verify_bundle() method has a 'verbose' option, but we will want to + extend this method to have more granular control over its output. First, + replace this 'verbose' option with a new 'flags' option with a single + possible value: VERIFY_BUNDLE_VERBOSE. Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx> @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi + * a reachable ref pointing to the new tips, which will reach + * the prerequisite commits. + */ -+ if ((result = unbundle(r, &header, bundle_fd, NULL, -+ VERIFY_BUNDLE_SKIP_REACHABLE))) ++ if ((result = unbundle(r, &header, bundle_fd, NULL, 0))) return 1; /* @@ bundle.c: static int list_refs(struct string_list *r, int argc, const char **arg /* * Do fast check, then if any prereqs are missing then go line by line @@ bundle.c: int verify_bundle(struct repository *r, - error("%s", message); error("%s %s", oid_to_hex(oid), name); } -- if (revs.pending.nr != p->nr) -+ if (revs.pending.nr != p->nr || -+ (flags & VERIFY_BUNDLE_SKIP_REACHABLE)) - goto cleanup; - req_nr = revs.pending.nr; - setup_revisions(2, argv, &revs, NULL); -@@ bundle.c: int verify_bundle(struct repository *r, - clear_commit_marks(commit, ALL_REV_FLAGS); - } - if (verbose) { + if (flags & VERIFY_BUNDLE_VERBOSE) { @@ bundle.h: int read_bundle_header_fd(int fd, struct bundle_header *header, + +enum verify_bundle_flags { + VERIFY_BUNDLE_VERBOSE = (1 << 0), -+ VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1) +}; + +int verify_bundle(struct repository *r, struct bundle_header *header, 10: 1cae3096624 ! 11: 2e0bfa834f1 bundle-uri: quiet failed unbundlings @@ Commit message Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx> + ## builtin/bundle.c ## +@@ builtin/bundle.c: static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) { + } + close(bundle_fd); + if (verify_bundle(the_repository, &header, +- quiet ? 0 : VERIFY_BUNDLE_VERBOSE)) { ++ quiet ? VERIFY_BUNDLE_QUIET : VERIFY_BUNDLE_VERBOSE)) { + ret = 1; + goto cleanup; + } + ## bundle-uri.c ## @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file) + * a reachable ref pointing to the new tips, which will reach * the prerequisite commits. */ - if ((result = unbundle(r, &header, bundle_fd, NULL, -- VERIFY_BUNDLE_SKIP_REACHABLE))) -+ VERIFY_BUNDLE_SKIP_REACHABLE | VERIFY_BUNDLE_QUIET))) +- if ((result = unbundle(r, &header, bundle_fd, NULL, 0))) ++ if ((result = unbundle(r, &header, bundle_fd, NULL, ++ VERIFY_BUNDLE_QUIET))) return 1; /* @@ bundle.h: int create_bundle(struct repository *r, const char *path, enum verify_bundle_flags { VERIFY_BUNDLE_VERBOSE = (1 << 0), -- VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1) -+ VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 1), -+ VERIFY_BUNDLE_QUIET = (1 << 2), ++ VERIFY_BUNDLE_QUIET = (1 << 1), }; int verify_bundle(struct repository *r, struct bundle_header *header, 11: 52a575f8a69 = 12: 5729ff2af4b bundle-uri: suppress stderr from remote-https -- gitgitgadget