Re: Partitioned packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Tue, 3 Apr 2007, Linus Torvalds wrote:
> 
> So trying to partition things doesn't help (because the objects are 
> already well sorted), and it does hurt.

Side note: I think that there *are* cases where partitioned packs can do 
better, but I think that in order to do better you should

 - partition by "recency", ie put objects that are not reachable from any 
   recent point in older packs.

 - make sure that the "packed_git" list is always sorted so that the older 
   data packs are at the end.

and that should actually speed up many loads, just because the recent 
objects are all in one pack, and because it's smaller, that pack can be 
looked up a bit faster.

On the other hand, the power of a log(n) function like a binary search is 
that lookup in a big pack that is four times the size of four smaller 
packs is really not all that much more expensive, so the advantage is 
probably pretty small.

And for things that need old objects (and "git blame" does obviously very 
much tend to fall into that category), any partitioning is likely to be 
bad.

So I think partitioning is valid, but my suspicion is that you'd want to 
partition for *other* reasons than highest performance. Better reasons to 
have multiple packs:

 - just because you haven't repacked ;)
 - to keep "git repack" times down by marking old big packs as "keep" once 
   they get big enough (the space advantage of packing eventually flattens 
   out, so there's no real overwhelming reason to repack old stuff if you 
   have "enough")
 - filesystem and pack-file limitations (ie the 2**31 limit)

but I doubt performance is ever going to be a really compelling one.

You can obviously always optimize for some very *particular* load by 
packing optimally for just that one (keep exactly the objects you need in 
one particular pack, don't even touch any other packs), but I don't think 
any load is *so* special that you shouldn't think of other loads.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]