On Sat, Feb 23, 2008 at 02:36:59PM +0100, J.C. Pizarro wrote: > On 2008/2/23, Charles Bailey <charles@xxxxxxxxxxxxx> wrote: > > > > It shouldn't matter how aggressively the repositories are packed or what > > the binary differences are between the pack files are. git clone > > should (with the --reference option) generate a new pack for you with > > only the missing objects. If these objects are ~52 MiB then a lot has > > been committed to the repository, but you're not going to be able to > > get around a big download any other way. > > You're wrong, nothing has to be commited ~52 MiB to the repository. > > I'm not saying "commit", i'm saying > > "Assume A & B binary git repos and delta_B-A another binary file, i > request built > B' = A + delta_B-A where is verified SHA1(B') = SHA1(B) for avoiding > corrupting". > > Assume B is the higher repacked version of "A + minor commits of the day" > as if B was optimizing 24 hours more the minimum spanning tree. Wow!!! > I'm not sure that I understand where you are going with this. Originally, you stated that if you clone a 775 MiB repository on day one, and then you clone it again on day two when it was 777 MiB, then you currently have to download 775 + 777 MiB of data, whereas you could download a 52 MiB binary diff. I have no idea where that value of 52 MiB comes from, and I've no idea how many objects were committed between day one and day two. If we're going to talk about details, then you need to provide more details about your scenario. Having said that, here is my original point in some more detail. git repositories are not binary blobs, they are object databases. Better than this, they are databases of immutable objects. This means that to get the difference between one database and another, you only need to add the objects that are missing from the other database. If the two databases are actually a database and the same database at short time interval later, then almost all the objects are going to be common and the difference will be a small set of objects. Using git:// this set of objects can be efficiently transfered as a pack file. You may have a corner case scenario where the following isn't true, but in my experience an incremental pack file will be a more compact representation of this difference than a binary difference of two aggressively repacked git repositories as generated by a generic binary difference engine. I'm sorry if I've misunderstood your last point. Perhaps you could expand in the exact issue that are having if I have, as I'm not sure that I've really answered your last message. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html