Re: Why do base objects appear behind the delta in packs?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jakub Narebski <jnareb@xxxxxxxxx> wrote:
> Shawn Pearce wrote:
> 
> > From a data locality perspective putting the base object before
> > or after the delta shouldn't matter, as either way the delta
> > is useless without the base.  So placing the base immediately
> > before the delta should perform just as well as placing it after.
> > Either way the OS should have the base in cache by the time the
> > delta is being accessed.
> 
> _Should_ perform? Have you got any measurements of speed of creating "base
> before delta" pack, and reading objects from this kind of pack?

No, not yet.  It just seemed odd to me that the base was put behind
the delta which then forces unpack-objects to hold a delta in memory
until it finds the corresponding base later in the stream when it
could have been just as simple to require the base appear before
the delta.  I wondered what the rationale was for the additional
complexity in unpack-objects.

Nicolas' reply pointed out that the current arrangement of base
after delta may actually offer improved performance due to the
OS performing read-ahead when you seek to the delta.  But he also
pointed out this base after delta situtation should be rather rare
as we try to delta older objects against newer objects and we try to
place newer objects at the front of the pack, so it likely shouldn't
matter that much.


I just instrumented builtin-pack-objects.c to count how many times
we put the delta before the base and then repacked a current Git
repo with `git repack -a -d -f`.  28167 objects, 19170 deltas. 6003
deltas appeared before their base objects.  So 31% of the time.
That's certainly not the common case but it does occur with some
frequency.  However resorting the output of verify-pack -v by offset
and visually looking at the entries you can clearly see it doesn't
happen very often early in the pack. Most of the objects in the
front of the pack are undeltafied commits.

This particular Git repository has 6723 commits and 905 trees that
weren't deltafied.  That's a total of 4 MiB of uncompressed data,
most of which appears at the front of the pack.  Only 68 commits
were deltas but 8067 trees were made into deltas.  The compressed
commits seemed to occupy the first 2 MiB of the pack file; that's
25% of the 8 MiB pack.  A commit-specific pack local dictionary
could be interesting here as it might some pack space.


I'm going to shutup now and not say anything further on the subject
unless I've got some hard results indicating a different organization
is better or worse than what we have right now.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]