[PATCH] midx: reduce memory pressure while writing bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Derrick Stolee <derrickstolee@xxxxxxxxxx>

We noticed that some 'git multi-pack-index write --bitmap' processes
were running with very high memory. It turns out that a lot of this
memory is required to store a list of every object in the written
multi-pack-index, with a second copy that has additional information
used for the bitmap writing logic.

Using 'valgrind --tool=massif' before this change, the following chart
shows how memory load increased and was maintained throughout the
process:

    GB
4.102^                                                             ::
     |              @  @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: :
     |         :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |      :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |    :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |    : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |    : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
   0 +--------------------------------------------------------------->

It turns out that the 'struct write_midx_context' data is persisting
through the life of the process, including the 'entries' array. This
array is used last inside find_commits_for_midx_bitmap() within
write_midx_bitmap(). If we free (and nullify) the array at that point,
we can free a decent chunk of memory before the bitmap logic adds more
to the memory footprint.

Here is the massif memory load chart after this change:

    GB
3.111^      #
     |      #                              :::::::::::@::::::::::::::@
     |      #        ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@
     |     @#  :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
   0 +--------------------------------------------------------------->

It is unfortunate that the lifetime of the 'entries' array is less
clear. To make this simpler, I added a few things to try and prevent an
accidental reference:

 1. Using FREE_AND_NULL() we will at least get a segfault from reading a
    NULL pointer instead of a use-after-free.

 2. 'entries_nr' is also set to zero to make any loop that would iterate
    over the entries be trivial.

 3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does
    not get another reference later. This requires adding a local copy
    of 'pack_order' giving us a reference that we can use later in the
    method.

 4. Add significant comments in write_midx_bitmap() and
    write_midx_internal() to add warnings for future authors who might
    accidentally add references to this cleared memory.

Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx>
---
    midx: reduce memory pressure while writing bitmaps
    
    The thing I'm most worried about with this patch is whether there is
    enough (or too much) defensive programming.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1292%2Fderrickstolee%2Fbitmap-memory-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1292/derrickstolee/bitmap-memory-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1292

 midx.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 5f0dd386b02..cc31d803a5f 100644
--- a/midx.c
+++ b/midx.c
@@ -1063,6 +1063,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	struct commit **commits = NULL;
 	uint32_t i, commits_nr;
 	uint16_t options = 0;
+	uint32_t *pack_order;
 	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
 	int ret;
 
@@ -1076,6 +1077,15 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 
 	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
 
+	/*
+	 * Remove the ctx.entries to reduce memory pressure.
+	 * Nullify 'ctx' to help avoid adding new references to ctx->entries.
+	 */
+	FREE_AND_NULL(ctx->entries);
+	ctx->entries_nr = 0;
+	pack_order = ctx->pack_order;
+	ctx = NULL;
+
 	/*
 	 * Build the MIDX-order index based on pdata.objects (which is already
 	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
@@ -1102,7 +1112,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 	 * bitmap_writer_finish().
 	 */
 	for (i = 0; i < pdata.nr_objects; i++)
-		index[ctx->pack_order[i]] = &pdata.objects[i].idx;
+		index[pack_order[i]] = &pdata.objects[i].idx;
 
 	bitmap_writer_select_commits(commits, commits_nr, -1);
 	ret = bitmap_writer_build(&pdata);
@@ -1443,6 +1453,11 @@ static int write_midx_internal(const char *object_dir,
 	if (flags & MIDX_WRITE_REV_INDEX &&
 	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
 		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
+
+	/*
+	 * Writing the bitmap must be last, as it will free ctx.entries
+	 * to reduce memory pressure during the bitmap write.
+	 */
 	if (flags & MIDX_WRITE_BITMAP) {
 		if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
 				      refs_snapshot, flags) < 0) {

base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a
-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux