[PATCH/RFC v2 0/9] Reduce pack-objects memory footprint

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The array of object_entry in pack-objects can take a lot of memory
when pack-objects is run in "pack everything" mode. On linux-2.6.git,
this array alone takes roughly 800MB.

This series reorders some fields and reduces field size... to keep
this struct smaller. Its size goes from 136 bytes to 96 bytes (29%) on
64-bit linux and saves 260MB on linux-2.6.git.

v2 only differs in limits, documentation and a new escape hatch for
the pack file limit.

Now the bad side:

- the number of pack files pack-objects can handle is reduced to 16k
  (previously unlimited, v1 has it at 4k)
- max delta chain is also limited to 4096 (previously practically
  unlimited), same as v1
- some patches are quite invasive (e.g. replacing pointer with
  uint32_t) and reduces readability a bit.
- it may be tricker to add more data in object_entry in the future.

In v1, if the total pack count is above the 4k limit, pack-objects
dies. v2 is a bit smarter and only count packs that do not have the
companion .keep files. This allows users with 16k+ pack files to
continue to use pack-objects by first adding .keep to reduce pack
count, repack, remove (some) .keep files, repack again...

While this process could be automated at least by 'git repack', given
the unbelievably high limit 16k, I don't think it's worth doing it.

Interdiff

diff --git a/Documentation/config.txt b/Documentation/config.txt
index f57e9cf10c..9bd3f5a789 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2412,6 +2412,7 @@ pack.window::
 pack.depth::
 	The maximum delta depth used by linkgit:git-pack-objects[1] when no
 	maximum depth is given on the command line. Defaults to 50.
+	Maximum value is 4095.
 
 pack.windowMemory::
 	The maximum size of memory that is consumed by each thread
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 81bc490ac5..b8d936ccf5 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -96,7 +96,9 @@ base-name::
 	it too deep affects the performance on the unpacker
 	side, because delta data needs to be applied that many
 	times to get to the necessary object.
-	The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
 
 --window-memory=<n>::
 	This option provides an additional limit on top of `--window`;
@@ -267,6 +269,15 @@ Unexpected missing object will raise an error.
 	locally created objects [without .promisor] and objects from the
 	promisor remote [with .promisor].)  This is used with partial clone.
 
+LIMITATIONS
+-----------
+
+This command could only handle 16384 existing pack files at a time.
+If you have more than this, you need to exclude some pack files with
+".keep" file and --honor-pack-keep option, to combine 16k pack files
+in one, then remove these .keep files and run pack-objects one more
+time.
+
 SEE ALSO
 --------
 linkgit:git-rev-list[1]
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index ae750e9e11..25c83c4927 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -90,7 +90,9 @@ other objects in that pack they already have locally.
 	space. `--depth` limits the maximum delta depth; making it too deep
 	affects the performance on the unpacker side, because delta data needs
 	to be applied that many times to get to the necessary object.
-	The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
 
 --threads=<n>::
 	This option is passed through to `git pack-objects`.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 45076f2523..55f19a1f18 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1038,7 +1038,7 @@ static int want_object_in_pack(const struct object_id *oid,
 	if (*found_pack) {
 		want = want_found_object(exclude, *found_pack);
 		if (want != -1)
-			return want;
+			goto done;
 	}
 
 	list_for_each(pos, &packed_git_mru) {
@@ -1061,11 +1061,27 @@ static int want_object_in_pack(const struct object_id *oid,
 			if (!exclude && want > 0)
 				list_move(&p->mru, &packed_git_mru);
 			if (want != -1)
-				return want;
+				goto done;
 		}
 	}
 
-	return 1;
+	want = 1;
+done:
+	if (want && *found_pack && !(*found_pack)->index) {
+		struct packed_git *p = *found_pack;
+
+		if (to_pack.in_pack_count >= (1 << OE_IN_PACK_BITS))
+			die(_("too many packs to handle in one go. "
+			      "Please add .keep files to exclude\n"
+			      "some pack files and keep the number "
+			      "of non-kept files below %d."),
+			    1 << OE_IN_PACK_BITS);
+
+		p->index = to_pack.in_pack_count++;
+		to_pack.in_pack[p->index] = p;
+	}
+
+	return want;
 }
 
 static void create_object_entry(const struct object_id *oid,
@@ -1088,6 +1104,8 @@ static void create_object_entry(const struct object_id *oid,
 	else
 		nr_result++;
 	if (found_pack) {
+		if (found_pack->index <= 0)
+			die("BUG: found_pack should be NULL instead of having non-positive index");
 		entry->in_pack_idx = found_pack->index;
 		entry->in_pack_offset = found_offset;
 	}
@@ -2978,18 +2996,12 @@ static int option_parse_unpack_unreachable(const struct option *opt,
 
 static void init_in_pack_mapping(struct packing_data *to_pack)
 {
-	struct packed_git *p;
-	int i = 0;
-
 	/* let IN_PACK() return NULL if in_pack_idx is zero */
-	to_pack->in_pack[i++] = NULL;
-
-	for (p = packed_git; p; p = p->next, i++) {
-		if (i >= (1 << OE_IN_PACK_BITS))
-			die("BUG: too many packs to handle!");
-		to_pack->in_pack[i] = p;
-		p->index = i;
-	}
+	to_pack->in_pack[to_pack->in_pack_count++] = NULL;
+	/*
+	 * the rest is lazily initialized only for packs that we want
+	 * in want_object_in_pack().
+	 */
 }
 
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
diff --git a/pack-objects.h b/pack-objects.h
index ec4eba4ee4..a57aca5f03 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -3,7 +3,7 @@
 
 #define OE_DFS_STATE_BITS 2
 #define OE_DEPTH_BITS 12
-#define OE_IN_PACK_BITS 12
+#define OE_IN_PACK_BITS 14
 
 #define IN_PACK_POS(to_pack, obj) \
 	(to_pack)->in_pack_pos[(struct object_entry *)(obj) - (to_pack)->objects]
@@ -60,7 +60,7 @@ struct object_entry {
 
 	unsigned depth:OE_DEPTH_BITS;
 
-	/* size: 96 */
+	/* size: 96, bit_padding: 18 bits */
 };
 
 struct packing_data {
@@ -71,6 +71,7 @@ struct packing_data {
 	uint32_t index_size;
 
 	unsigned int *in_pack_pos;
+	int in_pack_count;
 	struct packed_git *in_pack[1 << OE_IN_PACK_BITS];
 };
 

Nguyễn Thái Ngọc Duy (9):
  pack-objects: document holes in struct object_entry.h
  pack-objects: turn type and in_pack_type to bitfields
  pack-objects: use bitfield for object_entry::dfs_state
  pack-objects: use bitfield for object_entry::depth
  pack-objects: note about in_pack_header_size
  pack-objects: move in_pack_pos out of struct object_entry
  pack-objects: move in_pack out of struct object_entry
  pack-objects: refer to delta objects by index instead of pointer
  pack-objects: reorder 'hash' to pack struct object_entry

 Documentation/config.txt           |   1 +
 Documentation/git-pack-objects.txt |  13 +-
 Documentation/git-repack.txt       |   4 +-
 builtin/pack-objects.c             | 207 +++++++++++++++++++----------
 cache.h                            |   3 +
 object.h                           |   1 -
 pack-bitmap-write.c                |   8 +-
 pack-bitmap.c                      |   2 +-
 pack-bitmap.h                      |   4 +-
 pack-objects.h                     |  71 ++++++----
 10 files changed, 212 insertions(+), 102 deletions(-)

-- 
2.16.1.435.g8f24da2e1a




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux