Re:Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed

"Xing Xin" <bupt_xingxin@xxxxxxx> · Mon, 20 May 2024 17:41:12 +0800 (CST)

At 2024-05-17 13:00:49, "Patrick Steinhardt" <ps@xxxxxx> wrote:
>On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote:
>> From: Xing Xin <xingxin.xx@xxxxxxxxxxxxx>
>
>Long time no see :)

Glad to see you again here :)

>> 
>> So why the `refs/bundles/main` is not discovered? After some digging I
>> found that:
>> 
>> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
>>    check its prerequisites if any. The verify procedure would find oids
>>    so `packed_git` is initialized.
>> 
>> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
>>    during which `mark_complete_and_common_ref` and `mark_tips` would
>>    find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
>>    enlisted if `packed_git` has already initialized in 1.
>
>And I assume we do not want it to not use `OBJECT_INFO_QUICK`?

I think so. For clones or fetches without using bundle-uri, we can hardly
encounter the case that new packs are added during the negotiation process.
So using `OBJECT_INFO_QUICK` can get some performance gain.

>
>> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
>> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
>> can not find `B` for negotiation at a latter time bacause `B` exists in
>> `incr.pack` which is not enlisted in `packed_git`.
>
>Okay, the explanation feels sensible.
>
>> This commit fixes this by adding a `reprepare_packed_git` call for every
>> successfully unbundled bundle, which ensures to enlist all generated
>> packs from bundle uri. And a set of negotiation related tests are added.
>
>This makes me wonder though. Do we really need to call
>`reprepare_packed_git()` once for every bundle, or can't we instead call
>it once at the end once we have fetched all bundles? Reading on.
>
>> Signed-off-by: Xing Xin <xingxin.xx@xxxxxxxxxxxxx>
>> ---
>>     bundle-uri: refresh packed_git if unbundle succeed
>> 
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
>> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
>> 
>>  bundle-uri.c                |   3 +
>>  t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
>>  2 files changed, 132 insertions(+)
>> 
>> diff --git a/bundle-uri.c b/bundle-uri.c
>> index ca32050a78f..2b9d36cfd8e 100644
>> --- a/bundle-uri.c
>> +++ b/bundle-uri.c
>> @@ -7,6 +7,7 @@
>>  #include "refs.h"
>>  #include "run-command.h"
>>  #include "hashmap.h"
>> +#include "packfile.h"
>>  #include "pkt-line.h"
>>  #include "config.h"
>>  #include "remote.h"
>> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>>  			       VERIFY_BUNDLE_QUIET)))
>>  		return 1;
>>  
>> +	reprepare_packed_git(r);
>> +
>
>So what's hidden here is that `unbundle_from_file()` will also try to
>access the bundle's refs right away. Surprisingly, we do so by calling
>`refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the
>effect that we basically accept arbitrary object IDs here even if we do
>not know those. That's why we didn't have to `reprepare_packed_git()`
>before this change.

You are right! I tried dropping this `REF_SKIP_OID_VERIFICATION` flag and
the negotiation works as expected.

After some further digging I find that without `REF_SKIP_OID_VERIFICATION`,
both `write_ref_to_lockfile` for files backend and `reftable_be_transaction_prepare`
for reftable backend would call `parse_object` to check the oid. `parse_object`
can help refresh `packed_git` via `reprepare_packed_git`.

>
>Now there are two conflicting thoughts here:
>
>  - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
>    should now be accessible.
>
>  - Or we can avoid calling `reprepare_packed_git()` inside the loop and
>    instead call it once after we have fetched all bundles.
>
>The second one feels a bit like premature optimization to me. But the
>first item does feel like it could help us to catch broken bundles
>because we wouldn't end up creating refs for objects that neither we nor
>the bundle have.

I favor the first approach because a validation on the object IDs we are
writing is a safe guard . And the flag itself was designed to be used in
testing scenarios.

/*
 * Blindly write an object_id. This is useful for testing data corruption
 * scenarios.
 */
#define REF_SKIP_OID_VERIFICATION (1 << 10)