[PATCH v3 00/19] midx: incremental multi-pack indexes, part one

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This series implements incremental MIDXs, which allow for storing
a MIDX across multiple layers, each with their own distinct set of
packs.

This round is also similar to the previous one, but is rebased on
current 'master' (406f326d27 (The second batch, 2024-08-01)) and has
been updated in response to review from Peff on the previous round.

As usual, a range-diff is below, but the main changes since last time
are as follows:

  - Documentation improvements to clarify what happens when both an
    incremental- and non-incremental MIDX are both present in a
    repository.

  - Commit message typofix on 3/19 to fix an error in one of the
    technical examples.

  - Dropped a custom 'local_pack_int_id' in 4/19 to make the remaining
    diff easier to read.

  - Minor bugfix in 7/19 where we incorrectly terminated the object
    abbreviation disambiguation step for incremental MIDXs.

  - Various additional bits of information in the commit message to
    explain anything that was subtle.

Thanks in advance for any review! :-)

Taylor Blau (19):
  Documentation: describe incremental MIDX format
  midx: add new fields for incremental MIDX chains
  midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  midx: teach `prepare_midx_pack()` about incremental MIDXs
  midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  midx: introduce `bsearch_one_midx()`
  midx: teach `bsearch_midx()` about incremental MIDXs
  midx: teach `nth_midxed_offset()` about incremental MIDXs
  midx: teach `fill_midx_entry()` about incremental MIDXs
  midx: remove unused `midx_locate_pack()`
  midx: teach `midx_contains_pack()` about incremental MIDXs
  midx: teach `midx_preferred_pack()` about incremental MIDXs
  midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  midx: support reading incremental MIDX chains
  midx: implement verification support for incremental MIDXs
  t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  midx: implement support for writing incremental MIDX chains

 Documentation/git-multi-pack-index.txt       |  11 +-
 Documentation/technical/multi-pack-index.txt | 103 +++++
 builtin/multi-pack-index.c                   |   2 +
 builtin/repack.c                             |   8 +-
 ci/run-build-and-tests.sh                    |   2 +-
 midx-write.c                                 | 326 ++++++++++++---
 midx.c                                       | 405 ++++++++++++++++---
 midx.h                                       |  26 +-
 object-name.c                                |  99 ++---
 packfile.c                                   |  21 +-
 packfile.h                                   |   4 +
 t/README                                     |   6 +-
 t/helper/test-read-midx.c                    |  24 +-
 t/lib-bitmap.sh                              |   6 +-
 t/lib-midx.sh                                |  28 ++
 t/t0410-partial-clone.sh                     |   2 -
 t/t5310-pack-bitmaps.sh                      |   4 -
 t/t5313-pack-bounds-checks.sh                |   8 +-
 t/t5319-multi-pack-index.sh                  |  30 +-
 t/t5326-multi-pack-bitmaps.sh                |   4 +-
 t/t5327-multi-pack-bitmaps-rev.sh            |   6 +-
 t/t5332-multi-pack-reuse.sh                  |   2 +
 t/t5334-incremental-multi-pack-index.sh      |  46 +++
 t/t7700-repack.sh                            |  48 +--
 24 files changed, 960 insertions(+), 261 deletions(-)
 create mode 100755 t/t5334-incremental-multi-pack-index.sh

Range-diff against v2:
 1:  014588b3ec !  1:  90b21b11ed Documentation: describe incremental MIDX format
    @@ Documentation/technical/multi-pack-index.txt: Design Details
     +extending the incremental MIDX format to support reachability bitmaps.
     +The design below specifically takes this into account, and support for
     +reachability bitmaps will be added in a future patch series. It is
    -+omitted from this series for the same reason as above.
    ++omitted from the current implementation for the same reason as above.
     ++
     +In brief, to support reachability bitmaps with the incremental MIDX
     +feature, the concept of the pseudo-pack order is extended across each
    @@ Documentation/technical/multi-pack-index.txt: Design Details
     +multi-pack-index chain. The `multi-pack-index-$H2.midx` file contains
     +the second layer of the chain, and so on.
     +
    ++When both an incremental- and non-incremental MIDX are present, the
    ++non-incremental MIDX is always read first.
    ++
     +=== Object positions for incremental MIDXs
     +
     +In the original multi-pack-index design, we refer to objects via their
 2:  337ebc6de7 =  2:  0d3b19c59f midx: add new fields for incremental MIDX chains
 3:  f449a72877 !  3:  5cd742b677 midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
    @@ Commit message
           objects contained in all layers of the incremental MIDX chain, not any
           particular layer. For example, consider MIDX chain with two individual
           MIDXs, one with 4 objects and another with 3 objects. If the MIDX with
    -      4 objects appears earlier in the chain, then asking for pack "6" would
    +      4 objects appears earlier in the chain, then asking for object 6 would
           return the second object in the MIDX with 3 objects.
     
         [^2]: Building on the previous example, asking for object 6 in a MIDX
 4:  f88569c819 !  4:  372104c73d midx: teach `prepare_midx_pack()` about incremental MIDXs
    @@ midx.c: static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t p
      		die(_("bad pack-int-id: %u (%u total packs)"),
     -		    pack_int_id, m->num_packs);
     +		    pack_int_id, m->num_packs + m->num_packs_in_base);
    - 
    --	if (m->packs[pack_int_id])
    ++
     +	*_m = m;
     +
     +	return pack_int_id - m->num_packs_in_base;
    @@ midx.c: static uint32_t midx_for_object(struct multi_pack_index **_m, uint32_t p
     +{
     +	struct strbuf pack_name = STRBUF_INIT;
     +	struct packed_git *p;
    -+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
     +
    -+	if (m->packs[local_pack_int_id])
    ++	pack_int_id = midx_for_pack(&m, pack_int_id);
    + 
    + 	if (m->packs[pack_int_id])
      		return 0;
    - 
    - 	strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
    --		    m->pack_names[pack_int_id]);
    -+		    m->pack_names[local_pack_int_id]);
    - 
    - 	p = add_packed_git(pack_name.buf, pack_name.len, m->local);
    - 	strbuf_release(&pack_name);
    -@@ midx.c: int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t
    - 		return 1;
    - 
    - 	p->multi_pack_index = 1;
    --	m->packs[pack_int_id] = p;
    -+	m->packs[local_pack_int_id] = p;
    - 	install_packed_git(r, p);
    - 	list_add_tail(&p->mru, &r->objects->packed_git_mru);
    - 
 5:  ec57ff4349 =  5:  e68a3ceff9 midx: teach `nth_midxed_object_oid()` about incremental MIDXs
 6:  650b8c8c21 !  6:  ff2d7bc5ca midx: teach `nth_bitmapped_pack()` about incremental MIDXs
    @@ Commit message
         ID. Likewise, when reading the 'BTMP' chunk, use the MIDX-local offset
         when accessing the data within that chunk.
     
    +    (Note that the both the call to prepare_midx_pack() and the assignment
    +    of bp->pack_int_id both care about the global pack_int_id, so avoid
    +    shadowing the given 'pack_int_id' parameter).
    +
         Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
     
      ## midx.c ##
 7:  bfd1dadbf1 !  7:  32c3fceada midx: introduce `bsearch_one_midx()`
    @@ object-name.c: static int match_hash(unsigned len, const unsigned char *a, const
      
     -	if (!num)
     -		return;
    -+		num = m->num_objects + m->num_objects_in_base;
    ++		if (!m->num_objects)
    ++			continue;
      
     -	bsearch_midx(&ds->bin_pfx, m, &first);
    -+		if (!num)
    -+			continue;
    ++		num = m->num_objects + m->num_objects_in_base;
      
     -	/*
     -	 * At this point, "first" is the location of the lowest object
 8:  38bd45bd24 =  8:  16db6c98ce midx: teach `bsearch_midx()` about incremental MIDXs
 9:  342ed56033 =  9:  761c7c59ba midx: teach `nth_midxed_offset()` about incremental MIDXs
10:  2b335c45ae = 10:  8366456d29 midx: teach `fill_midx_entry()` about incremental MIDXs
11:  22de5898f3 = 11:  909d927c47 midx: remove unused `midx_locate_pack()`
12:  fb60f2b022 = 12:  71127601b5 midx: teach `midx_contains_pack()` about incremental MIDXs
13:  38b642d404 = 13:  2f98ebb141 midx: teach `midx_preferred_pack()` about incremental MIDXs
14:  594386da10 ! 14:  550ae2dc93 midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
    @@ Commit message
             MIDX layers when dealing with an incremental MIDX chain by calling
             itself when given a MIDX with a non-NULL `base_midx`.
     
    +    Note that after 0c5a62f14b (midx-write.c: do not read existing MIDX with
    +    `packs_to_include`, 2024-06-11), we do not use this function with an
    +    existing MIDX (incremental or not) when generating a MIDX with
    +    --stdin-packs, and likewise for incremental MIDXs.
    +
    +    But it is still used when adding the fanout table from an incremental
    +    MIDX when generating a non-incremental MIDX (without --stdin-packs, of
    +    course).
    +
         Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
     
      ## midx-write.c ##
15:  dad130799c ! 15:  9ae1bc415e midx: support reading incremental MIDX chains
    @@ Commit message
         in the commit after next.)
     
         The core of this change involves following the order specified in the
    -    MIDX chain and opening up MIDXs in the chain one-by-one, adding them to
    -    the previous layer's `->base_midx` pointer at each step.
    +    MIDX chain in reverse and opening up MIDXs in the chain one-by-one,
    +    adding them to the previous layer's `->base_midx` pointer at each step.
     
         In order to implement this, the `load_multi_pack_index()` function is
         taught to call a new `load_multi_pack_index_chain()` function if loading
16:  ad976ef413 = 16:  3d4181df51 midx: implement verification support for incremental MIDXs
17:  23912425bf = 17:  3b268f91bf t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
18:  814da1916d = 18:  09d74f8942 t/t5313-pack-bounds-checks.sh: prepare for sub-directories
19:  e2b5961b45 = 19:  5d467d38a8 midx: implement support for writing incremental MIDX chains

base-commit: 406f326d271e0bacecdb00425422c5fa3f314930
-- 
2.46.0.46.g406f326d27.dirty




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux