On Wed, 9 Oct 2013, Nguyễn Thái Ngọc Duy wrote: > I know I still have a lot of holes to plug, but this was more > interesting because we could see some encouraging numbers. > Unfortunately the result is disappointing. Maybe I did it in a stupid > way and need to restart with a totally different way. > > "rev-list --objects" on v2 takes 4 secs, v4 with current walker 11s > and the new walker 16s (worst!). Clear indication that something is wrong if the intermediate step of converting to the canonical object representation is faster than your attempt at a native pv4 walker. However here's the numbers I get here after a fresh clone of git.git $ time git revlist --objects --all > /dev/null real 0m2.619s user 0m2.577s sys 0m0.033s $ mkdir orig $ mv .git/objects/pack/pack-* orig/ $ ../../git/test-packv4 orig/pack-*.pack .git/objects/pack/pack-foo.pack Scanning objects: 100% (162785/162785), done. Writing objects: 100% (162785/162785), done. $ time git rev-list --objects --all > /dev/null real 0m6.210s user 0m6.140s sys 0m0.027s [installing git with your latest patches applied here] $ time git rev-list --objects --all > /dev/null real 0m20.409s user 0m20.337s sys 0m0.029s So... I get even *worse* numbers relative to v2 than you do! Now let's mitigate the deep delta chaining effect in the tree encoding: $ rm .git/objects/pack/pack-foo.* $ ../../git/test-packv4 --min-tree-copy=50 orig/pack-*.pack .git/objects/pack/pack-foo.pack Scanning objects: 100% (162785/162785), done. Writing objects: 100% (162785/162785), done. $ time git rev-list --objects --all > /dev/null real 0m9.451s user 0m9.393s sys 0m0.036s Using --min-tree-copy=50 produces a pack which is still smaller than pack v2 but any tree copy sequence must refer to a minimum of 50 entries. This significantly reduces the CPU usage in decode_entries() by reducing the needless chaining effect I mentioned here: http://article.gmane.org/gmane.comp.version-control.git/234975 Now let's keep that pack and back out your patches. [installing git with your latest patches reverted here] $ time git rev-list --objects --all > /dev/null real 0m3.751s user 0m3.711s sys 0m0.029s So I shoved almost 2.5 seconds of runtime here. Not the half of the original runtime, but still significant. Let's push the chaining theory even further: $ rm .git/objects/pack/pack-foo.* $ ../../git/test-packv4 --min-tree-copy=100 orig/pack-*.pack .git/objects/pack/pack-foo.pack Scanning objects: 100% (162785/162785), done. Writing objects: 100% (162785/162785), done. $ time git rev-list --objects --all > /dev/null real 0m3.445s user 0m3.406s sys 0m0.029s With --min-tree-copy=100 the resulting pack gets somewhat larger than the v2 equivalent, but still smaller if we take into account the size of both the pack and index files. However the runtime benefit is no more significant. So, there are 2 conclusions here: 1: The current tree delta euristic is inefficient for pack v4. 2- Something must be very wrong in your latest patches as they make it close to 3 times more expensive than without them. > The only bit I haven't done is avoid checking if a tree is already > examined, if so do not bother with copy sequences referring to it. > That should cut down the number of decode_entries but not sure how > much because there's no relation between tree traversing order and how > copy sequences are made. I'm sure it would help mitigate the deep delta chaining effect as well. > Maybe we could make an exception and allow the tree walker to pass > pv4_tree_cache* directly to decode_entries so it does not need to do > the first lookup every time.. > > Suggestions? I'll try to have a look at your patches in more details soon. Nicolas