Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"blanet via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Xing Xin <xingxin.xx@xxxxxxxxxxxxx>>
> When using the bundle-uri mechanism with a bundle list containing
> multiple interrelated bundles, we encountered a bug where tips from
> downloaded bundles were not being discovered, resulting in rather slow
> clones. This was particularly problematic when employing the heuristic
> `creationTokens`.
>
> And this is easy to reproduce. Suppose we have a repository with a
> single branch `main` pointing to commit `A`, firstly we create a base
> bundle with
>
>   git bundle create base.bundle main
>
> Then let's add a new commit `B` on top of `A`, so that an incremental
> bundle for `main` can be created with
>
>   git bundle create incr.bundle A..main
>
> Now we can generate a bundle list with the following content:
>
>   [bundle]
>       version = 1
>       mode = all
>       heuristic = creationToken
>
>   [bundle "base"]
>       uri = base.bundle
>       creationToken = 1
>
>   [bundle "incr"]
>       uri = incr.bundle
>       creationToken = 2
>
> A fresh clone with the bundle list above would give the expected
> `refs/bundles/main` pointing at `B` in new repository, in other words we
> already had everything locally from the bundles, but git would still
> download everything from server as if we got nothing.
>
> So why the `refs/bundles/main` is not discovered? After some digging I
> found that:
>
> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to

s/a//

>    check its prerequisites if any. The verify procedure would find oids
>    so `packed_git` is initialized.
>

So the flow is:
1. `fetch_bundle_list` fetches all the bundles advertised via
`download_bundle_list` to local files.
2. It then calls `unbundle_all_bundles` to unbundle all the bundles.
3. Each bundle is then unbundled using `unbundle_from_file`.
4. Here, we first read the bundle header to get all the prerequisites
for the bundle, this is done in `read_bundle_header`.
5. Then we call `unbundle`, which calls `verify_bundle` to ensure that
the repository does indeed contain the prerequisites mentioned in the
bundle. Then it creates the index from the bundle file.

So because the objects are being checked, the `prepare_packed_git`
function is eventually called, which means that the
`raw_object_store->packed_git` data gets filled in and
`packed_git_initialized` is set.

This means consecutive calls to `prepare_packed_git` doesn't
re-initiate `raw_object_store->packed_git` since
`packed_git_initialized` already is set.

So your explanation makes sense, as a _nit_ I would perhaps add the part
about why consecutive calls to `prepare_packed_git` are ineffective.

> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,

s/unbundled/unbundling

>    during which `mark_complete_and_common_ref` and `mark_tips` would
>    find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
>    enlisted if `packed_git` has already initialized in 1.
> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
> can not find `B` for negotiation at a latter time bacause `B` exists in
> `incr.pack` which is not enlisted in `packed_git`.
>
> This commit fixes this by adding a `reprepare_packed_git` call for every
> successfully unbundled bundle, which ensures to enlist all generated
> packs from bundle uri. And a set of negotiation related tests are added.
>

The solution makes sense.

> Signed-off-by: Xing Xin <xingxin.xx@xxxxxxxxxxxxx>
> ---
>     bundle-uri: refresh packed_git if unbundle succeed
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
>
>  bundle-uri.c                |   3 +
>  t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 132 insertions(+)
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index ca32050a78f..2b9d36cfd8e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -7,6 +7,7 @@
>  #include "refs.h"
>  #include "run-command.h"
>  #include "hashmap.h"
> +#include "packfile.h"
>  #include "pkt-line.h"
>  #include "config.h"
>  #include "remote.h"
> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  			       VERIFY_BUNDLE_QUIET)))
>  		return 1;
>
> +	reprepare_packed_git(r);
> +
>

Would it make sense to move this to `bundle.c:unbundle()`, since that is
also where the idx is created?

[snip]

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux