From: Derrick Stolee <derrickstolee@xxxxxxxxxx> We noticed that some 'git multi-pack-index write --bitmap' processes were running with very high memory. It turns out that a lot of this memory is required to store a list of every object in the written multi-pack-index, with a second copy that has additional information used for the bitmap writing logic. Using 'valgrind --tool=massif' before this change, the following chart shows how memory load increased and was maintained throughout the process: GB 4.102^ :: | @ @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: : | :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: : 0 +---------------------------------------------------------------> It turns out that the 'struct write_midx_context' data is persisting through the life of the process, including the 'entries' array. This array is used last inside find_commits_for_midx_bitmap() within write_midx_bitmap(). If we free (and nullify) the array at that point, we can free a decent chunk of memory before the bitmap logic adds more to the memory footprint. Here is the massif memory load chart after this change: GB 3.111^ # | # :::::::::::@::::::::::::::@ | # ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@ | @# :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ | :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@ 0 +---------------------------------------------------------------> It is unfortunate that the lifetime of the 'entries' array is less clear. To make this simpler, I added a few things to try and prevent an accidental reference: 1. Using FREE_AND_NULL() we will at least get a segfault from reading a NULL pointer instead of a use-after-free. 2. 'entries_nr' is also set to zero to make any loop that would iterate over the entries be trivial. 3. Set the 'ctx' pointer to NULL within write_midx_bitmap() so it does not get another reference later. This requires adding a local copy of 'pack_order' giving us a reference that we can use later in the method. 4. Add significant comments in write_midx_bitmap() and write_midx_internal() to add warnings for future authors who might accidentally add references to this cleared memory. Signed-off-by: Derrick Stolee <derrickstolee@xxxxxxxxxx> --- midx: reduce memory pressure while writing bitmaps The thing I'm most worried about with this patch is whether there is enough (or too much) defensive programming. Thanks, -Stolee Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1292%2Fderrickstolee%2Fbitmap-memory-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1292/derrickstolee/bitmap-memory-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1292 midx.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/midx.c b/midx.c index 5f0dd386b02..cc31d803a5f 100644 --- a/midx.c +++ b/midx.c @@ -1063,6 +1063,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, struct commit **commits = NULL; uint32_t i, commits_nr; uint16_t options = 0; + uint32_t *pack_order; char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash)); int ret; @@ -1076,6 +1077,15 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx); + /* + * Remove the ctx.entries to reduce memory pressure. + * Nullify 'ctx' to help avoid adding new references to ctx->entries. + */ + FREE_AND_NULL(ctx->entries); + ctx->entries_nr = 0; + pack_order = ctx->pack_order; + ctx = NULL; + /* * Build the MIDX-order index based on pdata.objects (which is already * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of @@ -1102,7 +1112,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, * bitmap_writer_finish(). */ for (i = 0; i < pdata.nr_objects; i++) - index[ctx->pack_order[i]] = &pdata.objects[i].idx; + index[pack_order[i]] = &pdata.objects[i].idx; bitmap_writer_select_commits(commits, commits_nr, -1); ret = bitmap_writer_build(&pdata); @@ -1443,6 +1453,11 @@ static int write_midx_internal(const char *object_dir, if (flags & MIDX_WRITE_REV_INDEX && git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0)) write_midx_reverse_index(midx_name.buf, midx_hash, &ctx); + + /* + * Writing the bitmap must be last, as it will free ctx.entries + * to reduce memory pressure during the bitmap write. + */ if (flags & MIDX_WRITE_BITMAP) { if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx, refs_snapshot, flags) < 0) { base-commit: 9dd64cb4d310986dd7b8ca7fff92f9b61e0bd21a -- gitgitgadget