On Sat, Feb 20, 2010 at 11:23:11AM -0800, Junio C Hamano wrote: > > There needs some benchmarking to justify it, I think. > > So I'd split this hunk out when queuing. I completely argee with your reasoning that it is a separate issue and it needs some benchmarking to prove its usefulness. So, I have done it today. For that, I have created a repository up to 512Mb size and containing up to 100,000 files (for small files the file number was the limiting factor, for big files, the total size limit was used). I intentionally limit the total size to 512Mb to make sure that they are in the FS cache and no disk related effects. The content of files have been generated randomly by /dev/urandom and then used in all tests for one file size. I have made 5 runs using mmap() (with git 1.7.0) and 5 runs with using read() (after applying the patch below). Below is the best result of 5 runs for each size. "Before" marks the original version, which uses mmap(). "After" marks the modified version using read(). The command used to measure time was: cat list | time git hash-object --stdin-paths >/dev/null So it is just calculating SHA-1 without deflating and writing the result on the disk, which significantly depends on fsync() speed. Tested on: Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz Here are results: file size = 1Kb; Hashing 100000 files Before: 0.63user 0.86system 0:01.49elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+100428minor)pagefaults 0swaps After: 0.54user 0.53system 0:01.07elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+421minor)pagefaults 0swaps file size = 2Kb; Hashing 100000 files Before: 1.04user 0.79system 0:01.82elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+100428minor)pagefaults 0swaps After: 0.95user 0.48system 0:01.43elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+422minor)pagefaults 0swaps file size = 4Kb; Hashing 100000 files Before: 1.73user 0.74system 0:02.47elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+100428minor)pagefaults 0swaps After: 1.54user 0.57system 0:02.11elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+424minor)pagefaults 0swaps file size = 8Kb; Hashing 64000 files Before: 1.86user 0.63system 0:02.48elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+128428minor)pagefaults 0swaps After: 1.75user 0.50system 0:02.23elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+424minor)pagefaults 0swaps file size = 16Kb; Hashing 32000 files Before: 1.73user 0.41system 0:02.14elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+128428minor)pagefaults 0swaps After: 1.68user 0.32system 0:02.00elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+425minor)pagefaults 0swaps file size = 32Kb; Hashing 16000 files Before: 1.65user 0.32system 0:01.96elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+128428minor)pagefaults 0swaps After: 1.64user 0.24system 0:01.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+431minor)pagefaults 0swaps file size = 64Kb; Hashing 8000 files Before: 1.71user 0.17system 0:01.87elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+128428minor)pagefaults 0swaps After: 1.63user 0.18system 0:01.81elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+438minor)pagefaults 0swaps file size = 128Kb; Hashing 4000 files Before: 1.60user 0.20system 0:01.79elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+128428minor)pagefaults 0swaps After: 1.55user 0.20system 0:01.75elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+454minor)pagefaults 0swaps file size = 256Kb; Hashing 2000 files Before: 1.62user 0.15system 0:01.77elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+128429minor)pagefaults 0swaps After: 1.56user 0.16system 0:01.71elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+551minor)pagefaults 0swaps file size = 512Kb; Hashing 1000 files Before: 1.59user 0.17system 0:01.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+128428minor)pagefaults 0swaps After: 1.56user 0.15system 0:01.71elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+679minor)pagefaults 0swaps file size = 1024Kb; Hashing 500 files Before: 1.64user 0.15system 0:01.78elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+128429minor)pagefaults 0swaps After: 1.61user 0.15system 0:01.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+934minor)pagefaults 0swaps As you can see, in all tests the read() version performed better than mmap() though the difference decreases with increase of the file size. While for 1Kb files, the speed up is 39% (based on the elapsed time), it is mere 1% for 1Mb file size. Note: I do not use strbuf_read(), because it is suboptimal to deal with this case, because we know the size ahead. (In fact, strbuf_read() is not so good even for unknown size as it does redundant strbuf_grow() almost in every use case, which probably needs to be fixed). -- >8 -- >From 6b3f8335dece7c9b9f810b1ab08f1bcb090e4d5e Mon Sep 17 00:00:00 2001 From: Dmitry Potapov <dpotapov@xxxxxxxxx> Date: Sun, 21 Feb 2010 09:32:19 +0300 Subject: [PATCH] hash-object: don't use mmap() for small files Using read() instead of mmap() can be 39% speed up for 1Kb files and is 1% speed up 1Mb files. For larger files, it is better to use mmap(), because the difference between is not significant, and when there is not enough memory, mmap() performs much better, because it avoids swapping. Signed-off-by: Dmitry Potapov <dpotapov@xxxxxxxxx> --- sha1_file.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 657825e..8a83e56 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2434,6 +2434,8 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size, return ret; } +#define SMALL_FILE_SIZE (1024*1024) + int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, enum object_type type, const char *path) { @@ -2448,6 +2450,14 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, else ret = -1; strbuf_release(&sbuf); + } else if (size <= SMALL_FILE_SIZE) { + char *buf = xmalloc(size); + if (size == read_in_full(fd, buf, size)) + ret = index_mem(sha1, buf, size, write_object, type, + path); + else + ret = error("short read %s", strerror(errno)); + free(buf); } else if (size) { void *buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); ret = index_mem(sha1, buf, size, write_object, type, path); -- 1.7.0 -- >8 -- Thanks, Dmitry -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html