On 2/4/12 7:53 AM, Nguyen Thai Ngoc Duy wrote:
On Fri, Feb 3, 2012 at 9:20 PM, Joshua Redstone<joshua.redstone@xxxxxx> wrote:
I timed a few common operations with both a warm OS file cache and a cold
cache. i.e., I did a 'echo 3 | tee /proc/sys/vm/drop_caches' and then did
the operation in question a few times (first timing is the cold timing,
the next few are the warm timings). The following results are on a server
with average hard drive (I.e., not flash) and> 10GB of ram.
'git status' : 39 minutes cold, and 24 seconds warm.
'git blame': 44 minutes cold, 11 minutes warm.
'git add' (appending a few chars to the end of a file and adding it): 7
seconds cold and 5 seconds warm.
'git commit -m "foo bar3" --no-verify --untracked-files=no --quiet
--no-status': 41 minutes cold, 20 seconds warm. I also hacked a version
of git to remove the three or four places where 'git commit' stats every
file in the repo, and this dropped the times to 30 minutes cold and 8
seconds warm.
Have you tried "git update-index --assume-unchaged"? That should
reduce mass lstat() and hopefully improve the above numbers. The
interface is not exactly easy-to-use, but if it has significant gain,
then we can try to improve UI.
On the index size issue, ideally we should make minimum writes to
index instead of rewriting 191 MB index. An improvement we could do
now is to compress it, reduce disk footprint, thus disk I/O. If you
compress the index with gzip, how big is it?
If you're not afraid to add filesystem-specific code to git, you could
leverage the btrfs find-new command (or use the ioctl directly) to
quickly find changed files since a certain point in time. Other CoW
filesystems may have similar mechanisms. You could for example store the
last generation id in an index extension, that's what those extensions
are for, right?
tom
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html