Re: [WISH] Store also tag dereferences in packed-refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sun, 19 Nov 2006, Marco Costalba wrote:
> 
> It does not seems there are strange delays, but total time it's high
> (very I/O bound)

This looks more normal. No truly horrid IO times. With your disk, having 
an uncached "stat64()" taking ~50ms is not at all impossible, if you just 
end up having to do a few seeks for directory/inode information.

> $ time strace -o tracefile -Ttt git show-ref -d >> /dev/null
> 0.02user 0.01system 0:02.39elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (127major+894minor)pagefaults 0swaps

So in addition to the "stat()" calls on all the objects you have 
referenced, you also had 127 page faults that needed to do IO (probably a 
combination of executable and pack-file accesses). 

I think the only way to avoid this is likely to try to either not do the 
object lookups at all (which you really cannot currently avoid with "-d", 
since the whole point is to dereference the objects if they are tags), or 
to do some silly optimizations like fsck does.

For example, it's often (but not always) faster to do all the readdir's 
separately, and then sort the thing by inode number, and try to avoid 
back-and-forth movement. But quite frankly, that kind of stuff probably 
isn't sane to do in "git show-refs".

So the optimizations that _can_ be done are:

 - add dereference info to .git/packed-refs

   This would allow us to simply not do the expensive object lookup for 
   every single tag. We'd still have to do it for non-packed objects, of 
   course, but the cost here tends to be that over time you might have 
   hundreds of tags, and even if each tag only takes 0.02s to look up, 
   you're going to be slow.

 - avoid the references for "heads/" (which we know are supposed to be 
   commits, and cannot be tags) and when not specifying "-d". This won't 
   help your case very much, though. If you want "-d", you want it, and 
   the _big_ number of refs tends to be in tags, not branches, anyway.

 - using a filesystem wih nicer locality behaviour for directory entries 
   and inodes. This can cut down costs of cold-cache case by a factor of 
   two, but right now there are no good filesystems that do this (but see 
   for example "spadfs" that Mikulas Patocka announced a few weeks ago on 
   linux-kernel - it would seem to have the possibility of being better in 
   this area. I looked at the code and it looked like it could become 
   very reasonable, but I've not actually _tested_ it, soo...)

Anyway, I think that if we really want to make "git show-refs" go fast 
when things are cold in the cache, and with lots ot tags and "-d" (which 
is a reasonable case to optimize for: it's probably exactly what we end up 
doing both for gitweb _and_ for "git-send-pack"), we'd need to expand the 
packed-refs file with the deref cache.

Junio?

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]