update-index --assume-unchanged doesn't make things go fast

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Using git 1.5.6.64.g85fe, but this applies to various other versions I've tried.

I have a git repo with about 17000+ files in 1000+ directories.  In
Linux, "git status" runs in under a second, which is perfectly fine.
But on Windows, which can apparently only stat() about 1000 files per
second, "git status" takes at least 17 seconds to run, even with a hot
cache.  (I've confirmed that stat() is so slow on Windows by writing a
simple program that just runs stat() in a tight loop.  The slowness
may be cygwin-related, as I found some direct Win32 calls that seem to
go more than twice as fast... which is still too slow.)

"git status" is not so important, since I can choose not to run it.
But it turns out that every git checkout and git commit does all the
same stuff, which is really not so great.  Even worse if you consider
that "git status" is almost always what I do by hand anyway to check
things before I commit.

So anyway, I read about the git-update-index --assume-unchanged
option, and thought that might be just what I want.  So I did this
(back in Linux, where things are easier to debug):

$ strace -fe lstat64 git status 2>&1 | wc -l
17869

$ git ls-files | xargs -d '\n' git update-index --assume-unchanged

$ strace -fe lstat64 git status 2>&1 | wc -l
33

So far, so good, and "git status" is now noticeably faster on my Linux
system (maybe twice as fast).  It's also noticeably faster on my
Windows system, but not as fast as I would have hoped.  I've tracked
it down to this:

$ strace -fe getdents64 git status 2>&1 | wc -l
2729

"git status" still checks all the *directories* to see if there are
any new files.  Of course!  --assume-unchanged can't be applied to a
directory, so there's no way to tell it not to do so.

Also, "git diff" is still as slow as ever:

$ strace -fe lstat64 git diff 2>&1 | wc -l
23199

It seems to be stat()ing the files even though they are
--assume-unchanged, which is probably a simple bug.

And while we're here, "git checkout" seems to be working a lot harder
than it should be:

$ strace -fe lstat64 git checkout -b boo 2>&1 | wc -l
23227

Note that I'm just creating a new branch name here, not even checking
out any new files, so I can't think of any situation where the
checkout would fail.  Is there one?

Even if I checkout a totally different branch, presumably it should
only need to stat() the files that changed between the old and new
versions, right?  And that would normally be very fast.

I don't mind doing some of the work to improve things here, as long as
people can give me some advice.  Specifically:

1) What's a sensible way to tell git to *not* opendir() specific
directories to look for unexpected files in "git status"?  (I don't
think I know enough to implement this myself.)

2) Do you think git-diff should honour --assume-unchanged?  If not, why not?

3) Do you think git-checkout can be optimized here?  I can see why it
might want to disregard --assume-unchanged (for safety reasons), but
presumably it only needs to look at all at files that it's planning to
change, right?

4) My idea is to eventually --assume-unchanged my whole repository,
then write a cheesy daemon that uses the Win32 dnotify-equivalent to
watch for files that get updated and then selectively
--no-assume-unchanged files that it gets notified about.  That would
avoid the need to ever synchronously scan the whole repo for changes,
thus making my git-Win32 experience much faster and more enjoyable.
(This daemon ought to be possible to run on Linux as well, for similar
improvements on gigantic repositories.  Also note that TortoiseSVN for
Windows does something similar to track file status updates, so this
isn't *just* me being crazy.)

Thoughts?

Thanks,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux