On Thu, Feb 16, 2012 at 8:20 PM, Jeff King <peff@xxxxxxxx> wrote: > On Thu, Feb 16, 2012 at 02:37:47PM +0100, Piotr Krukowiecki wrote: > >> >> $ time git status -- . >> >> real 0m2.503s >> >> user 0m0.160s >> >> sys 0m0.096s >> >> >> >> $ time git status >> >> real 0m9.663s >> >> user 0m0.232s >> >> sys 0m0.556s >> > >> > Did you drop caches here, too? >> >> Yes I did - with cache the status takes something like 0.1-0.3s on whole repo. > > OK, then that makes sense. It's pretty much just I/O on the filesystem > and on the object db. > > You can break status down a little more to see which is which. Try "git > update-index --refresh" to see just how expensive the lstat and index > handling is. "git update-index --refresh" with dropped cache took real 0m3.726s user 0m0.024s sys 0m0.404s while "git status" with dropped cache takes real 0m13.578s user 0m0.240s sys 0m0.600s I'm not sure why it takes more than the 9s reported before - IIRC I did the previous test in single mode under bare shell and this time I'm testing under gnome. This or it's the effect of running update-index :/ Now status on subdir takes 9.5s. But still the not-much-faster-status-on-subdir rule is true. > And then try "git diff-index HEAD" for an idea of how expensive it is to > just read the objects and compare to the index. The diff-index after dropping cache takes real 0m14.095s user 0m0.268s sys 0m0.564s >> > Not really. You're showing an I/O problem, and repacking is git's way of >> > reducing I/O. >> >> So if I understand correctly, the reason is because git must compare >> workspace files with packed objects - and the problem is >> reading/seeking/searching in the packs? > > Mostly reading (we keep a sorted index and access the packfiles via > mmap, so we only touch the pages we need). But you're also paying to > lstat() the directory tree, too. And you're paying to load (probably) > the whole index into memory, although it's relatively compact compared > to the actual file data. If the index is the objects/pack/*.idx files than it's 21MB >> Is there a way to make packs better? I think most operations are on >> workdir files - so maybe it'd be possible to tell gc/repack/whatever >> to optimize access to files which I currently have in workdir? > > It already does optimize for that case. If you can make it even better, > I'm sure people would be happy to see the numbers. If I understand correctly, you only need to compute sha1 on the workdir files and compare it with sha1 files recorded in index/gitdir. It seems that to get the sha1 from index/gitdir I need to read the packfiles? Maybe it'd be possible to cache/index it somehow, for example in separate and smaller file? > Mostly I think it is just the case that disk I/O is slow, and the > operation you're asking for has to do a certain amount of it. What kind > of disk/filesystem are you pulling off of? > > It's not a fuse filesystem by any chance, is it? I have a repo on an > encfs-mounted filesystem, and the lstat times are absolutely horrific. No, it's ext4 and the disk Seagate Barracuda 7200.12 500GB, as it reads on the cover :) But IMO faster disk won't help with this - times will be smaller, but you'll still have to read the same data, so the subdir times will be just 2x faster than whole repo, won't it? So maybe in my case it will go down to e.g. 2s on subdir, but for someone with larger repository it will still be 10s... -- Piotr Krukowiecki -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html