At 2024-05-17 13:00:49, "Patrick Steinhardt" <ps@xxxxxx> wrote: >On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote: >> From: Xing Xin <xingxin.xx@xxxxxxxxxxxxx> > >Long time no see :) Glad to see you again here :) >> >> So why the `refs/bundles/main` is not discovered? After some digging I >> found that: >> >> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to >> check its prerequisites if any. The verify procedure would find oids >> so `packed_git` is initialized. >> >> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`, >> during which `mark_complete_and_common_ref` and `mark_tips` would >> find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be >> enlisted if `packed_git` has already initialized in 1. > >And I assume we do not want it to not use `OBJECT_INFO_QUICK`? I think so. For clones or fetches without using bundle-uri, we can hardly encounter the case that new packs are added during the negotiation process. So using `OBJECT_INFO_QUICK` can get some performance gain. > >> Back to the example above, when unbunding `incr.bundle`, `base.pack` is >> enlisted to `packed_git` bacause of the prerequisites to verify. Then we >> can not find `B` for negotiation at a latter time bacause `B` exists in >> `incr.pack` which is not enlisted in `packed_git`. > >Okay, the explanation feels sensible. > >> This commit fixes this by adding a `reprepare_packed_git` call for every >> successfully unbundled bundle, which ensures to enlist all generated >> packs from bundle uri. And a set of negotiation related tests are added. > >This makes me wonder though. Do we really need to call >`reprepare_packed_git()` once for every bundle, or can't we instead call >it once at the end once we have fetched all bundles? Reading on. > >> Signed-off-by: Xing Xin <xingxin.xx@xxxxxxxxxxxxx> >> --- >> bundle-uri: refresh packed_git if unbundle succeed >> >> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1 >> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1 >> Pull-Request: https://github.com/gitgitgadget/git/pull/1730 >> >> bundle-uri.c | 3 + >> t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++ >> 2 files changed, 132 insertions(+) >> >> diff --git a/bundle-uri.c b/bundle-uri.c >> index ca32050a78f..2b9d36cfd8e 100644 >> --- a/bundle-uri.c >> +++ b/bundle-uri.c >> @@ -7,6 +7,7 @@ >> #include "refs.h" >> #include "run-command.h" >> #include "hashmap.h" >> +#include "packfile.h" >> #include "pkt-line.h" >> #include "config.h" >> #include "remote.h" >> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file) >> VERIFY_BUNDLE_QUIET))) >> return 1; >> >> + reprepare_packed_git(r); >> + > >So what's hidden here is that `unbundle_from_file()` will also try to >access the bundle's refs right away. Surprisingly, we do so by calling >`refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the >effect that we basically accept arbitrary object IDs here even if we do >not know those. That's why we didn't have to `reprepare_packed_git()` >before this change. You are right! I tried dropping this `REF_SKIP_OID_VERIFICATION` flag and the negotiation works as expected. After some further digging I find that without `REF_SKIP_OID_VERIFICATION`, both `write_ref_to_lockfile` for files backend and `reftable_be_transaction_prepare` for reftable backend would call `parse_object` to check the oid. `parse_object` can help refresh `packed_git` via `reprepare_packed_git`. > >Now there are two conflicting thoughts here: > > - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs > should now be accessible. > > - Or we can avoid calling `reprepare_packed_git()` inside the loop and > instead call it once after we have fetched all bundles. > >The second one feels a bit like premature optimization to me. But the >first item does feel like it could help us to catch broken bundles >because we wouldn't end up creating refs for objects that neither we nor >the bundle have. I favor the first approach because a validation on the object IDs we are writing is a safe guard . And the flag itself was designed to be used in testing scenarios. /* * Blindly write an object_id. This is useful for testing data corruption * scenarios. */ #define REF_SKIP_OID_VERIFICATION (1 << 10)