Re: With big repos and slower connections, git clone can be hard to work with

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 08, 2024 at 11:40:47AM +0200, ellie wrote:

> Sorry if I'm misunderstanding, and I assume this is a naive suggestion that
> may not work in some way: but couldn't git somehow retain all the objects it
> already has fully downloaded cached? And then otherwise start over cleanly
> (and automatically), but just get the objects it already has from the local
> cache? In practice, that might already be enough to get through a longer
> clone despite occasional hiccups.

The problem is that the client/server communication does not share an
explicit list of objects. Instead, the client tells the server some
points in the object graph that it wants (i.e., the tips of some
branches that it wants to fetch) and that it already has (existing
branches, or nothing in the case of a clone), and then the server can do
its own graph traversal to figure out what needs to be sent.

When you've got a partially completed clone, the client can figure out
which objects it received. But it can't tell the server "hey, I have
commit XYZ, don't send that". Because the server would assume that
having XYZ means that it has all of the objects reachable from there
(parent commits, their trees and blobs, and so on). And the pack does
not come in that order.

And even if there was a way to disable reachability analysis, and send a
"raw" set of objects that we already have, it would be prohibitively
large. The full set of sha1 hashes for linux.git is over 200MB. So
naively saying "don't send object X, I have it" would approach that
size.

It's possible the client could do some analysis to see if it has
complete segments of history. In practice it won't, because of the way
we order packfiles (it's split by type, and then roughly
reverse-chronological through history). If the server re-ordered its
response to fill history from the bottom up, it would be possible. We
don't do that now because it's not really the optimal order for
accessing objects in day-to-day use, and the packfile the server sends
is stored directly on disk by the client.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux