Re: [PATCH v3 0/2] read-cache: call verify_hdr() in a background thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> So just mentioning the test case and the improvement in the commit
> message is sufficient, IMHO.

So here is how I butchered [v3 1/2] to tentatively queue it on 'pu'.

Notable suggested changes I have in this one are:

 * I stole the numbers from the cover letter of v2 and added them at
   the end of the log message.

 * As the checksum is not a useless relic, but is an integrity
   check, I dropped the "ancient relic" from the proposed log
   message.  It is just that the modern disks are reliable enough to
   make it worthwhile to think about a trade-off this patch makes
   between performance and integrity.

 * As it is customary, the configuration variable starts as an opt
   in feature.  In a few releases, perhaps we can flip the default,
   but we do not do so from day one.

 * Updated the code around the call to config-get-bool to avoid
   asking the same question twice.

 * Added minimum documentation.

By the way, are we sure we have something that validates the
checksum regardless of the configuration setting?  If not, we may
want to tweak this further so that we can force the validation from
"git fsck" or something.  I am not going to do that myself, but it
may be necessary before this graduates to 'master'.

Thanks.

-- >8 --
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
Date: Tue, 28 Mar 2017 19:07:31 +0000
Subject: [PATCH] read-cache: core.checksumindex

Teach git to skip verification of the SHA-1 checksum at the end of
the index file in verify_hdr() called from read_index() when the
core.checksumIndex configuration variable is set to false.

The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.

On the Linux kernel repository, the effect is rather trivial.
The time to reading its index with 58k entries drops from 0.0284 sec
down to 0.0155 sec.

On my Windows source tree (450MB index), I'm seeing a savings of 0.6
seconds -- read_index() went from 1.2 to 0.6 seconds.

Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
---
 Documentation/config.txt |  8 ++++++++
 read-cache.c             | 16 ++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1df1965457..bc7b216d43 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -329,6 +329,14 @@ advice.*::
 		show directions on how to proceed from the current state.
 --
 
+core.checksumIndex::
+	Tell Git to validate the checksum at the end of the index
+	file to detect corruption.  Defaults to `true`.  Those who
+	work on a project with too many files may want to set this
+	variable to `false` to make it faster to load the index (in
+	exchange for reliability, but in general modern disks are
+	reliable enough for most people).
+
 core.fileMode::
 	Tells Git if the executable bit of files in the working tree
 	is to be honored.
diff --git a/read-cache.c b/read-cache.c
index e447751823..3195702cf7 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1376,12 +1376,28 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
 	git_SHA_CTX c;
 	unsigned char sha1[20];
 	int hdr_version;
+	static int do_checksum = -1;
 
 	if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
 		return error("bad signature");
 	hdr_version = ntohl(hdr->hdr_version);
 	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
+
+	if (do_checksum < 0) {
+		/*
+		 * Since we run very early in command startup, git_config()
+		 * may not have been called yet and the various "core_*"
+		 * global variables haven't been set.  So look it up
+		 * explicitly.
+		 */
+		git_config_get_bool("core.checksumindex", &do_checksum);
+		if (do_checksum < 0)
+			do_checksum = 0; /* default to false */
+	}
+	if (!do_checksum)
+		return 0;
+
 	git_SHA1_Init(&c);
 	git_SHA1_Update(&c, hdr, size - 20);
 	git_SHA1_Final(sha1, &c);
-- 
2.12.2-727-gf32eb5229d




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]