An experiment with -O3 is done on Intel D510@1.66GHz. At around 250k entries, index reading time exceeds 0.5s. Switching to crc32 brings it back lower than 0.2s. On 4M files index, reading time with SHA-1 takes ~8.4, crc32 2.8s. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> --- I know no real repositories this size though. gentoo-x86 is "only" 120k. Haven't checked libreoffice repo yet. On 2M files index, allocating one big block (i.e. reverting debed2a (read-cache.c: allocate index entries individually - 2011-10-24) saves about 0.3s. Maybe we can allocate one big block, then malloc separately when the block is fully used. Writing time is still high. "git update-index --crc32" on crc32 250k index takes 0.9s (so writing time is about 0.5s) A better solution may be narrow clone (or just the narrow checkout part), where index only contains entries from checked out subdirectories. Documentation/config.txt | 7 +++++++ builtin/update-index.c | 1 + cache.h | 1 + config.c | 5 +++++ environment.c | 1 + read-cache.c | 8 ++++++++ 6 files changed, 23 insertions(+), 0 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index abeb82b..55b7596 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -540,6 +540,13 @@ relatively high IO latencies. With this set to 'true', git will do the index comparison to the filesystem data in parallel, allowing overlapping IO's. +core.crc32IndexThreshold:: + Usually SHA-1 is used to check for index integerity. When the + number of entries in index exceeds this threshold, crc32 will + be used instead. Zero means SHA-1 always be used. Negative + value disables this threshold (i.e. crc32 or SHA-1 is decided + by other means). + core.createObject:: You can set this to 'link', in which case a hardlink followed by a delete of the source are used to make sure that object creation diff --git a/builtin/update-index.c b/builtin/update-index.c index 6913226..5cb51c7 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -856,6 +856,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) argc = parse_options_end(&ctx); if (do_crc != -1) { + core_crc32_index_threshold = -1; if (do_crc) the_index.hdr_flags |= CACHE_F_CRC; else diff --git a/cache.h b/cache.h index 7352402..d05856b 100644 --- a/cache.h +++ b/cache.h @@ -610,6 +610,7 @@ extern unsigned long pack_size_limit_cfg; extern int read_replace_refs; extern int fsync_object_files; extern int core_preload_index; +extern int core_crc32_index_threshold; extern int core_apply_sparse_checkout; enum branch_track { diff --git a/config.c b/config.c index 40f9c6d..905e071 100644 --- a/config.c +++ b/config.c @@ -671,6 +671,11 @@ static int git_default_core_config(const char *var, const char *value) return 0; } + if (!strcmp(var, "core.crc32indexthreshold")) { + core_crc32_index_threshold = git_config_int(var, value); + return 0; + } + if (!strcmp(var, "core.createobject")) { if (!strcmp(value, "rename")) object_creation_mode = OBJECT_CREATION_USES_RENAMES; diff --git a/environment.c b/environment.c index c93b8f4..9d9dfc2 100644 --- a/environment.c +++ b/environment.c @@ -66,6 +66,7 @@ unsigned long pack_size_limit_cfg; /* Parallel index stat data preload? */ int core_preload_index = 0; +int core_crc32_index_threshold = 250000; /* This is set by setup_git_dir_gently() and/or git_default_config() */ char *git_work_tree_cfg; diff --git a/read-cache.c b/read-cache.c index a34878e..fd032d8 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1582,6 +1582,14 @@ int write_index(struct index_state *istate, int newfd) } } + if (core_crc32_index_threshold >= 0) { + if (core_crc32_index_threshold > 0 && + istate->cache_nr >= core_crc32_index_threshold) + istate->hdr_flags |= CACHE_F_CRC; + else + istate->hdr_flags &= ~CACHE_F_CRC; + } + hdr.h.hdr_signature = htonl(CACHE_SIGNATURE); if (istate->hdr_flags) { hdr.h.hdr_version = htonl(4); -- 1.7.8.36.g69ee2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html