Re: Why do base objects appear behind the delta in packs?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <junkio@xxxxxxx> wrote:
> Shawn Pearce <spearce@xxxxxxxxxxx> writes:
> 
> >> Shawn Pearce wrote:
> >> 
> >> > From a data locality perspective putting the base object before
> >> > or after the delta shouldn't matter, as either way the delta
> >> > is useless without the base.  So placing the base immediately
> >> > before the delta should perform just as well as placing it after.
> >> > Either way the OS should have the base in cache by the time the
> >> > delta is being accessed.
> >... 
> > I'm going to shutup now and not say anything further on the subject
> > unless I've got some hard results indicating a different organization
> > is better or worse than what we have right now.
> 
> I think that may be a sensible thing to do (no sarcasm -- I
> think this measurement is long overdue).
> 
> The code was initially proposed just like you suggested but is
> in the current form precisely for the reason of avoiding
> back-seek.  I distinctly remember me asking Linus "does mmap()
> favor forward scan by doing readahead?  I thought its point was
> to allow random access" (the answer is "yes" and "yes but
> forward is the common case").
> 
> The pack-using side in sha1_file.c used to read deltified object
> (both header and delta) in full, pick up and read base, and
> apply delta to base.  This was thought to be memory hungry on a
> longer delta chain, so the current code reads only the header of
> a deltified object, reads base, then reads the delta to apply.
> The last step involves seeking back, and might make the
> back-seek avoidance less effective than before.

Thank you.  That was the sort of response I was looking for.  :-)

I know Jon wants to shrink that ~500 MB Mozilla pack to something
a lot smaller, and I'd like to help him do that without losing huge
amounts of performance on the read.  Very long delta chains (5000!)
are simply impossible to wade through for even one object; doing it
for an entire commit to checkout the files is something I wouldn't
want to wish on anyone.

So I'm probably going to wind up spending some time doing research
and experimentation on pack storage.  I may just discover we're
as good as we can get.  Or I may find that doing something else
saves us only 5% at the cost of far too much complexity and thus
isn't really worth doing.  Or I may get lucky and discover a way
to improve on what we have.

More on this thread (maybe) in a few months.  I have other stuff
I should be doing right now.  :)

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]