[PATCH v2 00/24] pack-bitmap: bitmap generation improvements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here is an updated version of this series, which improves the
performance of generating reachability bitmaps in large repositories.

Not very much has changed since last time, but a range-diff is below
nonetheless. The major changes are:

  - Avoid an overflow when bounds checking in the second and third
    patches (thanks, Martin, for noticing).
  - Incorporate a fix to avoid reading beyond an EWAH bitmap by double
    checking our read before actually doing it (thanks, Peff).
  - Harden the tests so that they pass under sha256-mode (thanks SZEDER,
    and Peff).

Derrick Stolee (9):
  pack-bitmap-write: fill bitmap with commit history
  bitmap: add bitmap_diff_nonzero()
  commit: implement commit_list_contains()
  t5310: add branch-based checks
  pack-bitmap-write: rename children to reverse_edges
  pack-bitmap-write: build fewer intermediate bitmaps
  pack-bitmap-write: use existing bitmaps
  pack-bitmap-write: relax unique rewalk condition
  pack-bitmap-write: better reuse bitmaps

Jeff King (11):
  pack-bitmap: fix header size check
  pack-bitmap: bounds-check size of cache extension
  t5310: drop size of truncated ewah bitmap
  rev-list: die when --test-bitmap detects a mismatch
  ewah: factor out bitmap growth
  ewah: make bitmap growth less aggressive
  ewah: implement bitmap_or()
  ewah: add bitmap_dup() function
  pack-bitmap-write: reimplement bitmap writing
  pack-bitmap-write: pass ownership of intermediate bitmaps
  pack-bitmap-write: ignore BITMAP_FLAG_REUSE

Taylor Blau (4):
  ewah/ewah_bitmap.c: grow buffer past 1
  pack-bitmap.c: check reads more aggressively when loading
  pack-bitmap: factor out 'bitmap_for_commit()'
  pack-bitmap: factor out 'add_commit_to_bitmap()'

 builtin/pack-objects.c  |   1 -
 commit.c                |  11 +
 commit.h                |   2 +
 ewah/bitmap.c           |  54 ++++-
 ewah/ewah_bitmap.c      |   2 +-
 ewah/ewok.h             |   3 +-
 pack-bitmap-write.c     | 452 +++++++++++++++++++++++++---------------
 pack-bitmap.c           | 139 ++++++------
 pack-bitmap.h           |   8 +-
 t/t5310-pack-bitmaps.sh | 164 ++++++++++++---
 10 files changed, 555 insertions(+), 281 deletions(-)

Range-diff against v1:
 -:  ---------- >  1:  07054ff8ee ewah/ewah_bitmap.c: grow buffer past 1
 1:  1970a70207 !  2:  74a13b4a6e pack-bitmap: fix header size check
    @@ pack-bitmap.c: static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *ind
     +	size_t header_size = sizeof(*header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz;

     -	if (index->map_size < sizeof(*header) + the_hash_algo->rawsz)
    -+	if (index->map_size < header_size)
    - 		return error("Corrupted bitmap index (missing header data)");
    +-		return error("Corrupted bitmap index (missing header data)");
    ++	if (index->map_size < header_size + the_hash_algo->rawsz)
    ++		return error("Corrupted bitmap index (too small)");

      	if (memcmp(header->magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)) != 0)
    + 		return error("Corrupted bitmap index file (wrong header)");
     @@ pack-bitmap.c: static int load_bitmap_header(struct bitmap_index *index)
      	}

 2:  36b1815d03 !  3:  db11116dac pack-bitmap: bounds-check size of cache extension
    @@ Commit message
         pack-bitmap: bounds-check size of cache extension

         A .bitmap file may have a "name hash cache" extension, which puts a
    -    sequence of uint32_t bytes (one per object) at the end of the file. When
    -    we see a flag indicating this extension, we blindly subtract the
    +    sequence of uint32_t values (one per object) at the end of the file.
    +    When we see a flag indicating this extension, we blindly subtract the
         appropriate number of bytes from our available length. However, if the
         .bitmap file is too short, we'll underflow our length variable and wrap
         around, thinking we have a very large length. This can lead to reading
    @@ pack-bitmap.c: static int load_bitmap_header(struct bitmap_index *index)
      	/* Parse known bitmap format options */
      	{
      		uint32_t flags = ntohs(header->options);
    -+		uint32_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t));
    ++		size_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t));
     +		unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz;

      		if ((flags & BITMAP_OPT_FULL_DAG) == 0)
    @@ pack-bitmap.c: static int load_bitmap_header(struct bitmap_index *index)
      		if (flags & BITMAP_OPT_HASH_CACHE) {
     -			unsigned char *end = index->map + index->map_size - the_hash_algo->rawsz;
     -			index->hashes = ((uint32_t *)end) - index->pack->num_objects;
    -+			if (index->map + header_size + cache_size > index_end)
    ++			if (cache_size > index_end - index->map - header_size)
     +				return error("corrupted bitmap index file (too short to fit hash cache)");
     +			index->hashes = (void *)(index_end - cache_size);
     +			index_end -= cache_size;
 3:  edfec2ea62 =  4:  f779e76f82 t5310: drop size of truncated ewah bitmap
 4:  f3fec466f7 =  5:  1a9ac1c4ae rev-list: die when --test-bitmap detects a mismatch
 5:  b35012f44d =  6:  9bb1ea3b19 ewah: factor out bitmap growth
 6:  53b8bea98c =  7:  f8426c7e8b ewah: make bitmap growth less aggressive
 7:  98e3bfc1b2 =  8:  674e31f98e ewah: implement bitmap_or()
 8:  1bd115fc51 =  9:  a903c949d8 ewah: add bitmap_dup() function
 9:  adf16557c2 = 10:  c951206729 pack-bitmap-write: reimplement bitmap writing
10:  27992687c9 = 11:  466dd3036a pack-bitmap-write: pass ownership of intermediate bitmaps
11:  d92fb0e1e1 = 12:  8e5607929d pack-bitmap-write: fill bitmap with commit history
12:  bf86cb6196 = 13:  4840c64c51 bitmap: add bitmap_diff_nonzero()
13:  78cdf847aa = 14:  63e846f4e8 commit: implement commit_list_contains()
14:  778e9e9c44 = 15:  8b5d239333 t5310: add branch-based checks
15:  526d3509ef = 16:  60a46091bb pack-bitmap-write: rename children to reverse_edges
 -:  ---------- > 17:  8f7bb2dd2e pack-bitmap.c: check reads more aggressively when loading
16:  86d77fd085 ! 18:  5262daa330 pack-bitmap-write: build fewer intermediate bitmaps
    @@ t/t5310-pack-bitmaps.sh: test_expect_success 'setup repo with moderate-sized his
      '

      test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
    +@@ t/t5310-pack-bitmaps.sh: test_expect_success 'truncated bitmap fails gracefully (ewah)' '
    + 	git rev-list --use-bitmap-index --count --all >expect &&
    + 	bitmap=$(ls .git/objects/pack/*.bitmap) &&
    + 	test_when_finished "rm -f $bitmap" &&
    +-	test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
    ++	test_copy_bytes 270 <$bitmap >$bitmap.tmp &&
    + 	mv -f $bitmap.tmp $bitmap &&
    + 	git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
    + 	test_cmp expect actual &&
17:  e4f296100c = 19:  a206f48614 pack-bitmap-write: ignore BITMAP_FLAG_REUSE
18:  6e856bcf75 = 20:  9928b3c7da pack-bitmap: factor out 'bitmap_for_commit()'
19:  9b5f595f50 = 21:  f40a39a48a pack-bitmap: factor out 'add_commit_to_bitmap()'
20:  c458f98e11 = 22:  4bf5e78a54 pack-bitmap-write: use existing bitmaps
21:  3026876e7a = 23:  1da4fa0fb8 pack-bitmap-write: relax unique rewalk condition
22:  ce2716e291 = 24:  42399a1c2e pack-bitmap-write: better reuse bitmaps
--
2.29.2.312.gabc4d358d8



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux