At 2024-05-17 15:36:53, "Karthik Nayak" <karthik.188@xxxxxxxxx> wrote: >"blanet via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > >> From: Xing Xin <xingxin.xx@xxxxxxxxxxxxx>> >> When using the bundle-uri mechanism with a bundle list containing >> multiple interrelated bundles, we encountered a bug where tips from >> downloaded bundles were not being discovered, resulting in rather slow >> clones. This was particularly problematic when employing the heuristic >> `creationTokens`. >> >> And this is easy to reproduce. Suppose we have a repository with a >> single branch `main` pointing to commit `A`, firstly we create a base >> bundle with >> >> git bundle create base.bundle main >> >> Then let's add a new commit `B` on top of `A`, so that an incremental >> bundle for `main` can be created with >> >> git bundle create incr.bundle A..main >> >> Now we can generate a bundle list with the following content: >> >> [bundle] >> version = 1 >> mode = all >> heuristic = creationToken >> >> [bundle "base"] >> uri = base.bundle >> creationToken = 1 >> >> [bundle "incr"] >> uri = incr.bundle >> creationToken = 2 >> >> A fresh clone with the bundle list above would give the expected >> `refs/bundles/main` pointing at `B` in new repository, in other words we >> already had everything locally from the bundles, but git would still >> download everything from server as if we got nothing. >> >> So why the `refs/bundles/main` is not discovered? After some digging I >> found that: >> >> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to > >s/a// Thanks! > >> check its prerequisites if any. The verify procedure would find oids >> so `packed_git` is initialized. >> > >So the flow is: >1. `fetch_bundle_list` fetches all the bundles advertised via >`download_bundle_list` to local files. >2. It then calls `unbundle_all_bundles` to unbundle all the bundles. >3. Each bundle is then unbundled using `unbundle_from_file`. >4. Here, we first read the bundle header to get all the prerequisites >for the bundle, this is done in `read_bundle_header`. >5. Then we call `unbundle`, which calls `verify_bundle` to ensure that >the repository does indeed contain the prerequisites mentioned in the >bundle. Then it creates the index from the bundle file. > >So because the objects are being checked, the `prepare_packed_git` >function is eventually called, which means that the >`raw_object_store->packed_git` data gets filled in and >`packed_git_initialized` is set. > >This means consecutive calls to `prepare_packed_git` doesn't >re-initiate `raw_object_store->packed_git` since >`packed_git_initialized` already is set. > >So your explanation makes sense, as a _nit_ I would perhaps add the part >about why consecutive calls to `prepare_packed_git` are ineffective. Thanks my friend, you have expressed this issue more clearly. I will post a new description based on your explanation with the creationToken case covered. > >> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`, > >s/unbundled/unbundling Copy that. > >> +#include "packfile.h" >> #include "pkt-line.h" >> #include "config.h" >> #include "remote.h" >> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file) >> VERIFY_BUNDLE_QUIET))) >> return 1; >> >> + reprepare_packed_git(r); >> + >> > >Would it make sense to move this to `bundle.c:unbundle()`, since that is >also where the idx is created? > I wonder if we need a mental model that we should `reprepare_packed_git` that when a new pack and its corresponding idx is generated? Currently whether to call `reprepare_packed_git` is determined by the caller. But within the scope of this bug, I tend to remove the `REF_SKIP_OID_VERIFICATION` flag when writing refs as Patrick suggested. Xing Xin