Re: [PATCH 0/5] cache-tree revisited

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thomas Rast wrote:
> Junio C Hamano wrote:
> > Ahh, I forgot all about that exchange.
> > 
> >   http://thread.gmane.org/gmane.comp.version-control.git/178480/focus=178515
> > 
> > The cache-tree mechanism has traditionally been one of the more important
> > optimizations and it would be very nice if we can resurrect the behaviour
> > for "git commit" too.
> 
> Oh, I buried that.  Let's try something other than the aggressive
> strategy I had there: only compute cache-tree if
> 
> * we know we're going to need it soon, and we're about to write out
>   the index anyway (as in git-commit)

I had another idea: we could write out *just* a new cache-tree data
set at the end of git-commit.

Doing it the cheap way would mean rehashing the on-disk data without
actually touching it.  (That might not be so bad, but then if your
index is small, why is writing it from scratch expensive?)

Doing it efficiently requires making the sha1 restartable, which is
entirely doable withblock-sha1/sha1.h (I haven't looked into
ppc/sha1.h).  As far as I can see it's not feasible with openssl's
sha1.

That is, we would add a new index extension (say PSHA: partial SHA)
and structure the index as

  signature
  header
  cache data
  PSHA <sha state up until just before PSHA>
  TREE ...
  [REUC ...]
  sha1 footer

Then it's easy to cheaply replace only the extensions, by restarting
the hashing from the PSHA data and re-emitting only the extension
data.

I think all the bits are in place, and it would be easy to do.
However, for it to make sense, we would have to make BLK_SHA1 the
default for the most-used platforms and also not mind extending the
SHA1 API.  Do you think that would fly?

I thought about other ways to make the index writing restartable from
the middle, but the only clean approach I came up with would require a
format change to something like

     signature
  0  header
  1  cache data
  2  sha1 of 0..1
  3  extension data A
  4  sha1 of 2..3
  5  extension data B
  6  sha1 of 4..5
  [possibly more]
  7  end-of-index marker
  8  sha1 of 6..7

etc.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]