This is a small reroll of my series which has a grab-bag of midx-write related cleanups that I pulled out of a larger series to implement incremental MIDX chains. This series is mostly the same as last time, but notable changes include: - renamed `get_sorted_entries()` to `compute_sorted_entries()` to avoid confusion that the function performs a side-effecting write to `ctx->entries_nr`. - fixed a handful of typos - add explanations in a couple of the patches to better motivate the change. Thanks to Patrick Steinhardt for his careful review on the previous round! Taylor Blau (8): midx-write.c: tolerate `--preferred-pack` without bitmaps midx-write.c: reduce argument count for `get_sorted_entries()` midx-write.c: pass `start_pack` to `compute_sorted_entries()` midx-write.c: extract `should_include_pack()` midx-write.c: extract `fill_packs_from_midx()` midx-write.c: support reading an existing MIDX with `packs_to_include` midx: replace `get_midx_rev_filename()` with a generic helper pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper midx-write.c | 161 ++++++++++++++++++------------------ midx.c | 14 ++-- midx.h | 6 +- pack-bitmap.c | 5 +- pack-revindex.c | 3 +- t/t5319-multi-pack-index.sh | 23 ++++++ 6 files changed, 119 insertions(+), 93 deletions(-) Range-diff against v1: 1: c753bc379b ! 1: ad268bd264 midx-write.c: tolerate `--preferred-pack` without bitmaps @@ Commit message unreferenced packs via 'git multi-pack-index expire'). In practice, this isn't possible to trigger when running `git - multi-pack-index write` from via `git repack`, as the latter always - passes `--stdin-packs`, which prevents us from loading an existing MIDX, - as it forces all packs to be read from disk. + multi-pack-index write` from `git repack`, as the latter always passes + `--stdin-packs`, which prevents us from loading an existing MIDX, as it + forces all packs to be read from disk. But a future commit in this series will change that behavior to unconditionally load an existing MIDX, even with `--stdin-packs`, making 2: 07dad5a581 ! 2: 9d422efe6e midx-write.c: reduce argument count for `get_sorted_entries()` @@ Commit message Simplify the declaration of this function by taking a single pointer to the whole `struct write_midx_context` instead of various members within - it. + it. Since this function is now computing the entire result (populating + both `ctx->entries`, and `ctx->entries_nr`), rename it to something that + doesn't start with "get_" to make clear that this function has a + side-effect. Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> @@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout - uint32_t nr_packs, - size_t *nr_objects, - int preferred_pack) -+static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx) ++static void compute_sorted_entries(struct write_midx_context *ctx) { uint32_t cur_fanout, cur_pack, cur_object; size_t alloc_objects, total_objects = 0; struct midx_fanout fanout = { 0 }; - struct pack_midx_entry *deduplicated_entries = NULL; +- struct pack_midx_entry *deduplicated_entries = NULL; - uint32_t start_pack = m ? m->num_packs : 0; + uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0; @@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout /* * As we de-duplicate by fanout value, we expect the fanout @@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, + alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16; ALLOC_ARRAY(fanout.entries, fanout.alloc); - ALLOC_ARRAY(deduplicated_entries, alloc_objects); +- ALLOC_ARRAY(deduplicated_entries, alloc_objects); - *nr_objects = 0; ++ ALLOC_ARRAY(ctx->entries, alloc_objects); + ctx->entries_nr = 0; for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) { @@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pac continue; - ALLOC_GROW(deduplicated_entries, st_add(*nr_objects, 1), -+ ALLOC_GROW(deduplicated_entries, st_add(ctx->entries_nr, 1), ++ ALLOC_GROW(ctx->entries, st_add(ctx->entries_nr, 1), alloc_objects); - memcpy(&deduplicated_entries[*nr_objects], -+ memcpy(&deduplicated_entries[ctx->entries_nr], ++ memcpy(&ctx->entries[ctx->entries_nr], &fanout.entries[cur_object], sizeof(struct pack_midx_entry)); - (*nr_objects)++; @@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pac } } + free(fanout.entries); +- return deduplicated_entries; + } + + static int write_midx_pack_names(struct hashfile *f, void *data) @@ midx-write.c: static int write_midx_internal(const char *object_dir, } } - ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr, - ctx.preferred_pack_idx); -+ ctx.entries = get_sorted_entries(&ctx); ++ compute_sorted_entries(&ctx); ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { 3: 7acf4557dc ! 3: e81296f8cc midx-write.c: pass `start_pack` to `get_sorted_entries()` @@ Metadata Author: Taylor Blau <me@xxxxxxxxxxxx> ## Commit message ## - midx-write.c: pass `start_pack` to `get_sorted_entries()` + midx-write.c: pass `start_pack` to `compute_sorted_entries()` - The function `get_sorted_entries()` is broadly responsible for + The function `compute_sorted_entries()` is broadly responsible for building an array of the objects to be written into a MIDX based on the provided list of packs. @@ Commit message The existing implementation simply skips past the first ctx->m->num_packs (if ctx->m is non-NULL, indicating that we loaded an - existing MIDX). Future changes (outside the scope of this patch series) - to the MIDX code will require us to skip *at most* that number[^1]. + existing MIDX). This is because we read objects in packs from an + existing MIDX via the MIDX itself, rather than from the pack-level + fanout to guarantee a de-duplicated result (see: a40498a1265 (midx: use + existing midx when writing new one, 2018-07-12)). + + Future changes (outside the scope of this patch series) to the MIDX code + will require us to skip *at most* that number[^1]. We could tag each pack with a bit that indicates the pack's contents should be included in the MIDX. But we can just as easily determine the @@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout * Copy only the de-duplicated entries (selected by most-recent modified time * of a packfile containing the object). */ --static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx) -+static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx, -+ uint32_t start_pack) +-static void compute_sorted_entries(struct write_midx_context *ctx) ++static void compute_sorted_entries(struct write_midx_context *ctx, ++ uint32_t start_pack) { uint32_t cur_fanout, cur_pack, cur_object; size_t alloc_objects, total_objects = 0; struct midx_fanout fanout = { 0 }; - struct pack_midx_entry *deduplicated_entries = NULL; - uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0; for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++) @@ midx-write.c: static int write_midx_internal(const char *object_dir, } } -- ctx.entries = get_sorted_entries(&ctx); -+ ctx.entries = get_sorted_entries(&ctx, start_pack); +- compute_sorted_entries(&ctx); ++ compute_sorted_entries(&ctx, start_pack); ctx.large_offsets_needed = 0; for (i = 0; i < ctx.entries_nr; i++) { 4: 3908546ea8 ! 4: 9cd9257465 midx-write.c: extract `should_include_pack()` @@ Commit message - or, appear in the "to_include" list, if invoking the MIDX write machinery with the `--stdin-packs` command-line argument. - In a future commit will want to call a slight variant of these checks - from the code that reuses all packs from an existing MIDX, as well as - the current location via add_pack_to_midx(). The latter will be - modified in subsequent commits to only reuse packs which appear in the - to_include list, if one was given. + A future commit will want to call a slight variant of these checks from + the code that reuses all packs from an existing MIDX, as well as the + current location via add_pack_to_midx(). The latter will be modified in + subsequent commits to only reuse packs which appear in the to_include + list, if one was given. Prepare for that step by extracting these checks as a subroutine that may be called from both places. 5: dc813ea1ca = 5: 1bb890e87c midx-write.c: extract `fill_packs_from_midx()` 6: 61268114c6 ! 6: fe187b1939 midx-write.c: support reading an existing MIDX with `packs_to_include` @@ Commit message reject all of the packs it provided since they appear in an existing MIDX by definition. + The sum total of this change is that we are now able to read and + reference objects in an existing MIDX even when given a non-NULL + `packs_to_include`. This is a prerequisite step for incremental MIDXs, + which need to load any existing MIDX (if one is present) in order to + determine whether or not an object already appears in an earlier portion + of the MIDX to avoid duplicating it across multiple portions. + Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> ## midx-write.c ## 7: f4c0167f43 = 7: f4f977c1c7 midx: replace `get_midx_rev_filename()` with a generic helper 8: 79e3f7f83f = 8: bcadaf9278 pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper base-commit: 3a57aa566a21e7a510c64881bc6bdff7eb397988 -- 2.45.1.321.gbcadaf92783