[PATCH] don't use mmap() to hash files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If a mmapped file is changed by another program during git-add, it
causes the repository corruption. Disabling mmap() in index_fd() does
not have any negative impact on the overall speed of Git. In fact, it
makes git hash-object to work slightly faster. Here is the best result
before and after patch based on 5 runs on the Linix kernel repository:

Before:

$ git ls-files | time git hash-object --stdin-path > /dev/null
2.15user 0.36system 0:02.52elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+103248minor)pagefaults 0swaps

After:

$ git ls-files | time ../git/git hash-object --stdin-path > /dev/null
2.09user 0.33system 0:02.42elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1073minor)pagefaults 0swaps

Signed-off-by: Dmitry Potapov <dpotapov@xxxxxxxxx>
---

I think more people should test this change to see its impact on
performance. For me, it was positive. Here is my results:

Using mmap (git version 1.7.0)

2.18user 0.33system 0:02.52elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+103248minor)pagefaults 0swaps
$ git ls-files | time git hash-object --stdin-path > /dev/null
2.23user 0.28system 0:02.53elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+103249minor)pagefaults 0swaps
$ git ls-files | time git hash-object --stdin-path > /dev/null
2.20user 0.31system 0:02.52elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+103248minor)pagefaults 0swaps
$ git ls-files | time git hash-object --stdin-path > /dev/null
2.21user 0.30system 0:02.51elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+103248minor)pagefaults 0swaps
$ git ls-files | time git hash-object --stdin-path > /dev/null
2.15user 0.36system 0:02.52elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+103248minor)pagefaults 0swaps

Using read() instead of mmap() (1.7.0 with the above patch)

$ git ls-files | time ../git/git hash-object --stdin-path > /dev/null
2.19user 0.24system 0:02.42elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1073minor)pagefaults 0swaps
$ git ls-files | time ../git/git hash-object --stdin-path > /dev/null
2.15user 0.26system 0:02.42elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1073minor)pagefaults 0swaps
$ git ls-files | time ../git/git hash-object --stdin-path > /dev/null
2.18user 0.24system 0:02.42elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1073minor)pagefaults 0swaps
$ git ls-files | time ../git/git hash-object --stdin-path > /dev/null
2.18user 0.25system 0:02.42elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1073minor)pagefaults 0swaps
$ git ls-files | time ../git/git hash-object --stdin-path > /dev/null
2.09user 0.33system 0:02.42elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1073minor)pagefaults 0swaps


 sha1_file.c |   22 +++++++---------------
 1 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 657825e..83f82a2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2438,22 +2438,14 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
 	     enum object_type type, const char *path)
 {
 	int ret;
-	size_t size = xsize_t(st->st_size);
+	struct strbuf sbuf = STRBUF_INIT;
 
-	if (!S_ISREG(st->st_mode)) {
-		struct strbuf sbuf = STRBUF_INIT;
-		if (strbuf_read(&sbuf, fd, 4096) >= 0)
-			ret = index_mem(sha1, sbuf.buf, sbuf.len, write_object,
-					type, path);
-		else
-			ret = -1;
-		strbuf_release(&sbuf);
-	} else if (size) {
-		void *buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
-		ret = index_mem(sha1, buf, size, write_object, type, path);
-		munmap(buf, size);
-	} else
-		ret = index_mem(sha1, NULL, size, write_object, type, path);
+	if (strbuf_read(&sbuf, fd, 4096) >= 0)
+		ret = index_mem(sha1, sbuf.buf, sbuf.len, write_object,
+				type, path);
+	else
+		ret = -1;
+	strbuf_release(&sbuf);
 	close(fd);
 	return ret;
 }
-- 
1.7.0

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]