Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/12/2022 2:51 PM, Abhradeep Chakraborty wrote:

> I think I have found the problem. Derrick was right that `mtime` part
> is not the culprit. I tried to understand the whole midx workflow and
> some questions were raised in my mind. I don't know whether those are
> features or bugs (because I do not have much experience in
> multi-pack-index code).
> 
> I am writing a brief description for the context of the issue and the
> questions I have.

Thanks for the detailed writeup.

> Let us start from the `write_midx_internal()` function. As
> `packs_to_include` is null in our case, We can use the old midx to
> write a new midx file. The line `ctx.m =
> lookup_multi_pack_index(the_repository, object_dir)`[1]  does this. It
> also loads packs that do not belong to any multi-pack-indexes. It also
> sets `the_repository->objects->packed_git_intialized` to 1.  If we
> look at our test case (`setup partial bitmap`) the last `.pack` file
> (generated by `git repack &&` ) does not belong to any midx. So, that
> pack will be loaded in this step.
> 
> [1] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L1169
> 
> Next let us move to the `if (ctx.m)`[2] block. As we will be writing a
> bitmap, `if (flags & MIDX_WRITE_REV_INDEX)` is true. Thus all packs
> related to the old midx are loaded and `ctx.info[ctx.nr].p` stores the
> pointers of these packs.
> 
> [2] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L1182
> 
> After that we come to the `for_each_file_in_pack_dir(object_dir,
> add_pack_to_midx, &ctx);` line[3] . The `add_pack_to_midx`[4] function
> adds packs (that are not in the old midx) to `ctx.info`. Now I have a
> question here - Why are we using  the `add_packed_git()`[5] function
> provided we already loaded those packs in the
> `lookup_multi_pack_index` step (i.e. 1st step)? These packs are not
> added in `r->objects->packed_git`. This question is related to our
> current issue.
> 
> I.e. instead of this -
> 
>    ctx->info[ctx->nr].p = add_packed_git(full_path,
> 
> full_path_len, 0);
> 
> Why not this (or similar) -
> 
>     for (cp = the_repository->objects->packed_git; cp; cp = cp->next)
>         if (!cmp_idx_or_pack_name(cp->pack_name, full_path))
>             ctx->info[ctx->nr].p = cp;
> 
> [3] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L1221
> [4] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L462
> [5] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/midx.c#L492
> 
>  `write_midx_bitmap()` function is where bitmap related code starts.
> let us directly jump into the `prepare_packed_git()` function (called
> by `get_all_packs()`[6]). As I said previously,
> `r->objects->packed_git_initialized` is already enabled so this
> function becomes a no-op function. Which means it does not load the
> newly written midx (by calling `prepare_multi_pack_index_one`
> function) and uses old midx to write the bitmap (though we still have
> new packs and they can be used with the old midx to generate the
> bitmap, maybe?) . Here comes my second question - Is this the desired
> case? or should we use the new midx to write the bitmaps?

The confusing part of all this is that the bitmaps are being written
while the "new" midx is written only to "multi-pack-index.lock" and
has not been renamed to "multi-pack-index". If we renamed first, then
the old .bitmap file would not match the new midx and all Git commands
would act as if there was no .bitmap file.
 
> One important point to note is that `get_all_packs()` returns
> `r->objects->packed_git` which now stores pointers of all the packs
> and only these packfiles have their `->index` set.
> 
> [6] https://github.com/git/git/blob/5502f77b6944eda8e26813d8f542cffe7d110aea/packfile.c#L1043
> 
> Now let us move to the last function - `oe_set_in_pack()` (called by
> `prepare_midx_packing_data()`). Note that, we are passing
> `ctx->info[ctx->pack_perm[from->pack_int_id]].p` along with other
> parameters. As I have said in an earlier para (containing my first
> question), `ctx->info` has some packs (i.e. newer packs that are not
> related to the old midx) that are not installed in
> `r->objects->packed_git` . In other words, we have two instances of
> the same pack file - one in `r->objects->packed_git` list and another
> in `ctx->info[id].p`. As `prepare_in_pack_by_idx` function only sets
> `->index` for `r->objects->packed_git` packs, these packs (i.e.
> `ctx.info[id].p`) do not have their p->index set and thus end up
> calling the `oe_map_new_pack` function.

So really, the problem is that we are handling the r->objects->packed_git
list instead of an array of packs that are under the control of the new
midx. This assumption is baked deep in the pack-objects flow, so it
would be hard to separate this idea.

Perhaps doing the reprepare_packed_git() to regenerate the list would be
sufficient as a band-aid for now, but we would want to later do the big
dig of focusing the pack_data struct to a specific list of pack-files
(by default the set from get_all_packs(), but for midx bitmaps we can
supply a specific set of packs).

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux