Re: Achieving efficient storage of weirdly structured repos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 3 Apr 2008, Jakub Narebski wrote:

> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
> 
> > On Thu, 3 Apr 2008, Roman Shaposhnik wrote:
> >> 
> >> The last item (trees) also seem to take the most space and the most 
> >> reasonable explanation that I can offer is that NetBeans repository has 
> >> a really weird structure where they have approximately 700 (yes, seven 
> >> hundred!) top-level subdirectories there. They are clearly 
> >> Submodules-shy, but that's another issue that I will need to address 
> >> with them.
> > 
> > Trees taking the biggest amount of space is not unheard of, and it may 
> > also be that the name heuristics (for finding good packing partners) could 
> > be failign, which would result in a much bigger pack than necessary. 
> > 
> > So if you already did an aggressive repack like the above, I'd happily 
> > take a look at whether maybe it's bad heuristics for finding tree objects 
> > to pair up for delta-compression. Do you have a place where you can put 
> > that repo for people to clone and look at? 
> 
> Hmmm... I wonder if it would be the case that would speed-up
> development of pack v4.

Not really.  Pack v4 won't magically shrink a repository to less than 
half the pack v3 size.

I think we're simply facing the same situation as with the initial GCC 
repository which shrank from 3GB down to 300MB or so due to misfitted 
repacking parameters.

> If I remember correctly one of bigger changes
> was the way trees were represented in pack; the biggest improvement
> was for trees.

Yes, but that wasn't really so much about size but rather access speed 
by not deflating them. The pack v4 tree representation would certainly 
help, of course, but I suspect that simply repacking with more 
aggressive window/depth arguments would be even more effective in this 
case.

> One of bigger hindrances, as I understand it, in developing pack v4
> was the fact that it didn't offer that much of improvement in typical
> cases for the work needed... but perhaps "your" repository would be
> good showcase for pack v4.

The biggest hindrance for pack v4 is actually the lack of a native 
runtime tree walking, and having both tree object formats properly and 
optimally abstracted has not been looked at yet.

Speed is the primary goal for pack v4.  The fact that it also provides a 
10% pack reduction is only consequential.  But without native tree 
walking we must recreate the legacy tree format on the fly each time a 
tree object is loaded which dwarfs any improvements pack v4 is aiming 
for (yes it is still a little bit faster than pack v3 nevertheless, but 
not yet significantly enough to overcome the incompatibility costs).


Nicolas (who wishes he was still a student with plenty of hacking time)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux