Jakub Narebski <jnareb@xxxxxxxxx> wrote: > Shawn Pearce wrote: > > > From a data locality perspective putting the base object before > > or after the delta shouldn't matter, as either way the delta > > is useless without the base. So placing the base immediately > > before the delta should perform just as well as placing it after. > > Either way the OS should have the base in cache by the time the > > delta is being accessed. > > _Should_ perform? Have you got any measurements of speed of creating "base > before delta" pack, and reading objects from this kind of pack? No, not yet. It just seemed odd to me that the base was put behind the delta which then forces unpack-objects to hold a delta in memory until it finds the corresponding base later in the stream when it could have been just as simple to require the base appear before the delta. I wondered what the rationale was for the additional complexity in unpack-objects. Nicolas' reply pointed out that the current arrangement of base after delta may actually offer improved performance due to the OS performing read-ahead when you seek to the delta. But he also pointed out this base after delta situtation should be rather rare as we try to delta older objects against newer objects and we try to place newer objects at the front of the pack, so it likely shouldn't matter that much. I just instrumented builtin-pack-objects.c to count how many times we put the delta before the base and then repacked a current Git repo with `git repack -a -d -f`. 28167 objects, 19170 deltas. 6003 deltas appeared before their base objects. So 31% of the time. That's certainly not the common case but it does occur with some frequency. However resorting the output of verify-pack -v by offset and visually looking at the entries you can clearly see it doesn't happen very often early in the pack. Most of the objects in the front of the pack are undeltafied commits. This particular Git repository has 6723 commits and 905 trees that weren't deltafied. That's a total of 4 MiB of uncompressed data, most of which appears at the front of the pack. Only 68 commits were deltas but 8067 trees were made into deltas. The compressed commits seemed to occupy the first 2 MiB of the pack file; that's 25% of the 8 MiB pack. A commit-specific pack local dictionary could be interesting here as it might some pack space. I'm going to shutup now and not say anything further on the subject unless I've got some hard results indicating a different organization is better or worse than what we have right now. -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html