[PATCH v2 0/3] midx: reduce memory pressure while writing bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We noticed an instance where writing multi-pack-index bitmaps was taking a
lot of memory. This small change can reduce the memory pressure slightly
(~25%), but more will be needed to significantly reduce the memory pressure.
Such a change would require updating the bitmap writing code to use on-disk
data structures instead. This is particularly tricky when the
multi-pack-index has not been fully written, because we don't want a point
in time where the object store has a new multi-pack-index without a
reachability bitmap.


Updates in v2
=============

To reduce confusion on the lifetime of 'ctx.entries', some refactoring
patches are inserted to first extract the use of 'ctx' out of
write_midx_bitmap() and into write_midx_internal(). This makes the
FREE_AND_NULL() stand out more clearly.

Thanks, -Stolee

Derrick Stolee (3):
  pack-bitmap-write: use const for hashes
  midx: extract bitmap write setup
  midx: reduce memory pressure while writing bitmaps

 midx.c              | 69 +++++++++++++++++++++++++++++----------------
 pack-bitmap-write.c |  2 +-
 pack-bitmap.h       |  2 +-
 3 files changed, 47 insertions(+), 26 deletions(-)


base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1292%2Fderrickstolee%2Fbitmap-memory-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1292/derrickstolee/bitmap-memory-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1292

Range-diff vs v1:

 -:  ----------- > 1:  a09fdbb8f3e pack-bitmap-write: use const for hashes
 1:  9104bc55795 ! 2:  4dfb7ae5112 midx:  reduce memory pressure while writing bitmaps
     @@ Metadata
      Author: Derrick Stolee <derrickstolee@xxxxxxxxxx>
      
       ## Commit message ##
     -    midx:  reduce memory pressure while writing bitmaps
     +    midx: extract bitmap write setup
      
     -    We noticed that some 'git multi-pack-index write --bitmap' processes
     -    were running with very high memory. It turns out that a lot of this
     -    memory is required to store a list of every object in the written
     -    multi-pack-index, with a second copy that has additional information
     -    used for the bitmap writing logic.
     +    The write_midx_bitmap() method is a long method that does a lot of
     +    steps. It requires the write_midx_context struct for use in
     +    prepare_midx_packing_data() and find_commits_for_midx_bitmap(), but
     +    after that only needs the pack_order array.
      
     -    Using 'valgrind --tool=massif' before this change, the following chart
     -    shows how memory load increased and was maintained throughout the
     -    process:
     -
     -        GB
     -    4.102^                                                             ::
     -         |              @  @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: :
     -         |         :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         |      :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         |    :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         |    : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         |    : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -         | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     -       0 +--------------------------------------------------------------->
     -
     -    It turns out that the 'struct write_midx_context' data is persisting
     -    through the life of the process, including the 'entries' array. This
     -    array is used last inside find_commits_for_midx_bitmap() within
     -    write_midx_bitmap(). If we free (and nullify) the array at that point,
     -    we can free a decent chunk of memory before the bitmap logic adds more
     -    to the memory footprint.
     -
     -    Here is the massif memory load chart after this change:
     -
     -        GB
     -    3.111^      #
     -         |      #                              :::::::::::@::::::::::::::@
     -         |      #        ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@
     -         |     @#  :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |  :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -         |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     -       0 +--------------------------------------------------------------->
     -
     -    It is unfortunate that the lifetime of the 'entries' array is less
     -    clear. To make this simpler, I added a few things to try and prevent an
     -    accidental reference:
     -
     -     1. Using FREE_AND_NULL() we will at least get a segfault from reading a
     -        NULL pointer instead of a use-after-free.
     -
     -     2. 'entries_nr' is also set to zero to make any loop that would iterate
     -        over the entries be trivial.
     -
     -     3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does
     -        not get another reference later. This requires adding a local copy
     -        of 'pack_order' giving us a reference that we can use later in the
     -        method.
     -
     -     4. Add significant comments in write_midx_bitmap() and
     -        write_midx_internal() to add warnings for future authors who might
     -        accidentally add references to this cleared memory.
     +    This is a messy, but completely non-functional refactoring. The code is
     +    only being moved around to reduce visibility of the write_midx_context
     +    during the longest part of computing reachability bitmaps.
      
          Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx>
      
       ## midx.c ##
     -@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
     - 	struct commit **commits = NULL;
     - 	uint32_t i, commits_nr;
     - 	uint16_t options = 0;
     -+	uint32_t *pack_order;
     - 	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
     - 	int ret;
     +@@ midx.c: static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
     + 	return cb.commits;
     + }
       
     -@@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
     +-static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
     +-			     struct write_midx_context *ctx,
     ++static int write_midx_bitmap(const char *midx_name,
     ++			     const unsigned char *midx_hash,
     ++			     struct packing_data *pdata,
     ++			     struct commit **commits,
     ++			     uint32_t commits_nr,
     ++			     uint32_t *pack_order,
     + 			     const char *refs_snapshot,
     + 			     unsigned flags)
     + {
     +-	struct packing_data pdata;
     +-	struct pack_idx_entry **index;
     +-	struct commit **commits = NULL;
     +-	uint32_t i, commits_nr;
     ++	int ret, i;
     + 	uint16_t options = 0;
     +-	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
     +-	int ret;
     +-
     +-	if (!ctx->entries_nr)
     +-		BUG("cannot write a bitmap without any objects");
     ++	struct pack_idx_entry **index;
     ++	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name,
     ++					hash_to_hex(midx_hash));
       
     - 	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
     + 	if (flags & MIDX_WRITE_BITMAP_HASH_CACHE)
     + 		options |= BITMAP_OPT_HASH_CACHE;
       
     -+	/*
     -+	 * Remove the ctx.entries to reduce memory pressure.
     -+	 * Nullify 'ctx' to help avoid adding new references to ctx->entries.
     -+	 */
     -+	FREE_AND_NULL(ctx->entries);
     -+	ctx->entries_nr = 0;
     -+	pack_order = ctx->pack_order;
     -+	ctx = NULL;
     -+
     +-	prepare_midx_packing_data(&pdata, ctx);
     +-
     +-	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
     +-
       	/*
       	 * Build the MIDX-order index based on pdata.objects (which is already
       	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
     + 	 * this order).
     + 	 */
     +-	ALLOC_ARRAY(index, pdata.nr_objects);
     +-	for (i = 0; i < pdata.nr_objects; i++)
     +-		index[i] = &pdata.objects[i].idx;
     ++	ALLOC_ARRAY(index, pdata->nr_objects);
     ++	for (i = 0; i < pdata->nr_objects; i++)
     ++		index[i] = &pdata->objects[i].idx;
     + 
     + 	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
     +-	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
     ++	bitmap_writer_build_type_index(pdata, index, pdata->nr_objects);
     + 
     + 	/*
     + 	 * bitmap_writer_finish expects objects in lex order, but pack_order
      @@ midx.c: static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
     + 	 * happens between bitmap_writer_build_type_index() and
       	 * bitmap_writer_finish().
       	 */
     - 	for (i = 0; i < pdata.nr_objects; i++)
     +-	for (i = 0; i < pdata.nr_objects; i++)
      -		index[ctx->pack_order[i]] = &pdata.objects[i].idx;
     -+		index[pack_order[i]] = &pdata.objects[i].idx;
     ++	for (i = 0; i < pdata->nr_objects; i++)
     ++		index[pack_order[i]] = &pdata->objects[i].idx;
       
       	bitmap_writer_select_commits(commits, commits_nr, -1);
     - 	ret = bitmap_writer_build(&pdata);
     +-	ret = bitmap_writer_build(&pdata);
     ++	ret = bitmap_writer_build(pdata);
     + 	if (ret < 0)
     + 		goto cleanup;
     + 
     + 	bitmap_writer_set_checksum(midx_hash);
     +-	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, options);
     ++	bitmap_writer_finish(index, pdata->nr_objects, bitmap_name, options);
     + 
     + cleanup:
     + 	free(index);
      @@ midx.c: static int write_midx_internal(const char *object_dir,
       	if (flags & MIDX_WRITE_REV_INDEX &&
       	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
       		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
      +
     -+	/*
     -+	 * Writing the bitmap must be last, as it will free ctx.entries
     -+	 * to reduce memory pressure during the bitmap write.
     -+	 */
       	if (flags & MIDX_WRITE_BITMAP) {
     - 		if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
     +-		if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
     ++		struct packing_data pdata;
     ++		struct commit **commits;
     ++		uint32_t commits_nr;
     ++
     ++		if (!ctx.entries_nr)
     ++			BUG("cannot write a bitmap without any objects");
     ++
     ++		prepare_midx_packing_data(&pdata, &ctx);
     ++
     ++		commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, &ctx);
     ++
     ++		if (write_midx_bitmap(midx_name.buf, midx_hash, &pdata,
     ++				      commits, commits_nr, ctx.pack_order,
       				      refs_snapshot, flags) < 0) {
     + 			error(_("could not write multi-pack bitmap"));
     + 			result = 1;
 -:  ----------- > 3:  98e72f71b6b midx: reduce memory pressure while writing bitmaps

-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux