"blanet via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Xing Xin <xingxin.xx@xxxxxxxxxxxxx>> > When using the bundle-uri mechanism with a bundle list containing > multiple interrelated bundles, we encountered a bug where tips from > downloaded bundles were not being discovered, resulting in rather slow > clones. This was particularly problematic when employing the heuristic > `creationTokens`. > > And this is easy to reproduce. Suppose we have a repository with a > single branch `main` pointing to commit `A`, firstly we create a base > bundle with > > git bundle create base.bundle main > > Then let's add a new commit `B` on top of `A`, so that an incremental > bundle for `main` can be created with > > git bundle create incr.bundle A..main > > Now we can generate a bundle list with the following content: > > [bundle] > version = 1 > mode = all > heuristic = creationToken > > [bundle "base"] > uri = base.bundle > creationToken = 1 > > [bundle "incr"] > uri = incr.bundle > creationToken = 2 > > A fresh clone with the bundle list above would give the expected > `refs/bundles/main` pointing at `B` in new repository, in other words we > already had everything locally from the bundles, but git would still > download everything from server as if we got nothing. > > So why the `refs/bundles/main` is not discovered? After some digging I > found that: > > 1. when unbundling a downloaded bundle, a `verify_bundle` is called to s/a// > check its prerequisites if any. The verify procedure would find oids > so `packed_git` is initialized. > So the flow is: 1. `fetch_bundle_list` fetches all the bundles advertised via `download_bundle_list` to local files. 2. It then calls `unbundle_all_bundles` to unbundle all the bundles. 3. Each bundle is then unbundled using `unbundle_from_file`. 4. Here, we first read the bundle header to get all the prerequisites for the bundle, this is done in `read_bundle_header`. 5. Then we call `unbundle`, which calls `verify_bundle` to ensure that the repository does indeed contain the prerequisites mentioned in the bundle. Then it creates the index from the bundle file. So because the objects are being checked, the `prepare_packed_git` function is eventually called, which means that the `raw_object_store->packed_git` data gets filled in and `packed_git_initialized` is set. This means consecutive calls to `prepare_packed_git` doesn't re-initiate `raw_object_store->packed_git` since `packed_git_initialized` already is set. So your explanation makes sense, as a _nit_ I would perhaps add the part about why consecutive calls to `prepare_packed_git` are ineffective. > 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`, s/unbundled/unbundling > during which `mark_complete_and_common_ref` and `mark_tips` would > find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be > enlisted if `packed_git` has already initialized in 1. > Back to the example above, when unbunding `incr.bundle`, `base.pack` is > enlisted to `packed_git` bacause of the prerequisites to verify. Then we > can not find `B` for negotiation at a latter time bacause `B` exists in > `incr.pack` which is not enlisted in `packed_git`. > > This commit fixes this by adding a `reprepare_packed_git` call for every > successfully unbundled bundle, which ensures to enlist all generated > packs from bundle uri. And a set of negotiation related tests are added. > The solution makes sense. > Signed-off-by: Xing Xin <xingxin.xx@xxxxxxxxxxxxx> > --- > bundle-uri: refresh packed_git if unbundle succeed > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1730 > > bundle-uri.c | 3 + > t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 132 insertions(+) > > diff --git a/bundle-uri.c b/bundle-uri.c > index ca32050a78f..2b9d36cfd8e 100644 > --- a/bundle-uri.c > +++ b/bundle-uri.c > @@ -7,6 +7,7 @@ > #include "refs.h" > #include "run-command.h" > #include "hashmap.h" > +#include "packfile.h" > #include "pkt-line.h" > #include "config.h" > #include "remote.h" > @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file) > VERIFY_BUNDLE_QUIET))) > return 1; > > + reprepare_packed_git(r); > + > Would it make sense to move this to `bundle.c:unbundle()`, since that is also where the idx is created? [snip]
Attachment:
signature.asc
Description: PGP signature