Re: [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Sat, 8 Aug 2009 17:18:42 +0200 (CEST)

Hi,

On Fri, 7 Aug 2009, Nicolas Pitre wrote:

> On Fri, 7 Aug 2009, Johannes Schindelin wrote:
> 
> > Hi,
> > 
> > On Fri, 7 Aug 2009, Nicolas Pitre wrote:
> > 
> > > On Fri, 7 Aug 2009, Sam Vilain wrote:
> > > 
> > > > Johannes Schindelin wrote:
> > > > >> the short answer is that cache slices are totally independant of 
> > > > >> pack files.
> > > > >>     
> > > > >
> > > > > My idea with that was that you already have a SHA-1 map in the pack 
> > > > > index, and if all you want to be able to accelerate the revision 
> > > > > walker, you'd probably need something that adds yet another mapping, 
> > > > > from commit to parents and tree, and from tree to sub-tree and blob 
> > > > > (so you can avoid unpacking commit and tree objects).
> > > > >   
> > > > 
> > > > Tying indexes together like that is not a good idea in the database 
> > > > world. Especially as in this case as Nick mentions, the domain is 
> > > > subtly different (ie pack vs dag). Unfortunately you just can't try to 
> > > > pretend that they will always be the same; you can't force a full 
> > > > repack on every ref change!
> > > 
> > > Right.  And the rev cache must work even if the repository is not 
> > > packed.
> > 
> > Umm, why?  AFAICT the principal purpose of the rev cache is to help work 
> > loads on, say, www.kernel.org.
> 
> So what?
> 
> Speeding up rev-list with a rev cache is completely orthogonal to 
> whether the repository is packed or not.

No, it is not.

For both technical and practical reasons, caching revision walker data is
very closely related to packing.

You are _very_ unlikely helped by speeding up revision walking in the 
general case, _especially_ when you do stuff like blame or -S that needs 
to unpack tons of objects _anyway_.

The one big kicker argument for speeding up revision walking _is_ to 
relieve the loads on big ass servers, and they _should_ be as packed as 
possible (as I will patiently explain over and over again).

> It is like having a "git diff" result cache: no one would think of 
> stuffing that in the pack index.

Do you want to try to kid me?  You'll have to try harder.  Caching "git 
diff" results... no, really!

> If we want to improve on the repository packing format, that must be 
> doable without bothering with an independent concept such as a rev 
> cache.

I would love to tell you that you're right, but the single fact that 
pack v4 is startig to compete with Duke Nukem Forever just prevents me 
from doing that.

> > I am unlikely to notice the improvements in my regular "git log" calls 
> > that only show a couple of pages before I quit the pager.
> 
> Indeed.  But what is your point again?

Oh?  My point?  Being that the rev cache has a certain target audience, 
and that the regular user is not part of that audience, and that it just 
so happens that the _technical_ similarities with the pack index can be 
exploited in those scenarios?

IOW we can be pretty certain that a heavy-load server has a fully (or 
next-to-fully) packed object database.  The pack indices already contain a 
SHA-1 table that we can simply reuse.  And it should not be hard (or 
fragile) at all to put the "cached" information about parents, 
referenced tree and blob objects into that file, into a different section.

After all, the parents, referenced tree and blob objects to change as 
often as the objects in the pack: never.

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html