[PATCH 6/6] Automatically switch to crc32 checksum for index when it's too large

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



An experiment with -O3 is done on Intel D510@1.66GHz. At around 250k
entries, index reading time exceeds 0.5s. Switching to crc32 brings it
back lower than 0.2s.

On 4M files index, reading time with SHA-1 takes ~8.4, crc32 2.8s.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
---
 I know no real repositories this size though. gentoo-x86 is "only"
 120k. Haven't checked libreoffice repo yet.

 On 2M files index, allocating one big block (i.e. reverting debed2a
 (read-cache.c: allocate index entries individually - 2011-10-24)
 saves about 0.3s. Maybe we can allocate one big block, then malloc
 separately when the block is fully used.

 Writing time is still high. "git update-index --crc32" on crc32 250k index
 takes 0.9s (so writing time is about 0.5s)

 A better solution may be narrow clone (or just the narrow checkout
 part), where index only contains entries from checked out
 subdirectories.

 Documentation/config.txt |    7 +++++++
 builtin/update-index.c   |    1 +
 cache.h                  |    1 +
 config.c                 |    5 +++++
 environment.c            |    1 +
 read-cache.c             |    8 ++++++++
 6 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index abeb82b..55b7596 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -540,6 +540,13 @@ relatively high IO latencies.  With this set to 'true', git will do the
 index comparison to the filesystem data in parallel, allowing
 overlapping IO's.
 
+core.crc32IndexThreshold::
+	Usually SHA-1 is used to check for index integerity. When the
+	number of entries in index exceeds this threshold, crc32 will
+	be used instead. Zero means SHA-1 always be used. Negative
+	value disables this threshold (i.e. crc32 or SHA-1 is decided
+	by other means).
+
 core.createObject::
 	You can set this to 'link', in which case a hardlink followed by
 	a delete of the source are used to make sure that object creation
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 6913226..5cb51c7 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -856,6 +856,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	argc = parse_options_end(&ctx);
 
 	if (do_crc != -1) {
+		core_crc32_index_threshold = -1;
 		if (do_crc)
 			the_index.hdr_flags |= CACHE_F_CRC;
 		else
diff --git a/cache.h b/cache.h
index 7352402..d05856b 100644
--- a/cache.h
+++ b/cache.h
@@ -610,6 +610,7 @@ extern unsigned long pack_size_limit_cfg;
 extern int read_replace_refs;
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_crc32_index_threshold;
 extern int core_apply_sparse_checkout;
 
 enum branch_track {
diff --git a/config.c b/config.c
index 40f9c6d..905e071 100644
--- a/config.c
+++ b/config.c
@@ -671,6 +671,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.crc32indexthreshold")) {
+		core_crc32_index_threshold = git_config_int(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.createobject")) {
 		if (!strcmp(value, "rename"))
 			object_creation_mode = OBJECT_CREATION_USES_RENAMES;
diff --git a/environment.c b/environment.c
index c93b8f4..9d9dfc2 100644
--- a/environment.c
+++ b/environment.c
@@ -66,6 +66,7 @@ unsigned long pack_size_limit_cfg;
 
 /* Parallel index stat data preload? */
 int core_preload_index = 0;
+int core_crc32_index_threshold = 250000;
 
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
diff --git a/read-cache.c b/read-cache.c
index a34878e..fd032d8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1582,6 +1582,14 @@ int write_index(struct index_state *istate, int newfd)
 		}
 	}
 
+	if (core_crc32_index_threshold >= 0) {
+		if (core_crc32_index_threshold > 0 &&
+		    istate->cache_nr >= core_crc32_index_threshold)
+			istate->hdr_flags |= CACHE_F_CRC;
+		else
+			istate->hdr_flags &= ~CACHE_F_CRC;
+	}
+
 	hdr.h.hdr_signature = htonl(CACHE_SIGNATURE);
 	if (istate->hdr_flags) {
 		hdr.h.hdr_version = htonl(4);
-- 
1.7.8.36.g69ee2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]