+cc Ben Peart, who sent "[RFC] Add support for downloading blobs on demand" to the list recently. This proposal here seems like it has the same goal, so maybe your review could go a long way here? Thanks, Stefan On Tue, Mar 14, 2017 at 3:57 PM, Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > As described in "Background" below, there have been at least 2 patch sets to > support "partial clones" and on-demand blob fetches, where the server part > that supports on-demand blob fetches was treated at least in outline. Here > is a proposal treating that server part in detail. > > == Background > > The desire for Git to support (i) missing blobs and (ii) fetching them as > needed from a remote repository has surfaced on the mailing list a few > times, most recently in the form of RFC patch sets [1] [2]. > > A local repository that supports (i) will be created by a "partial clone", > that is, a clone with some special parameters (exact parameters are still > being discussed) that does not download all blobs normally downloaded. Such > a repository should support (ii), which is what this proposal describes. > > == Design > > A new endpoint "server" is created. The client will send a message in the > following format: > > ---- > fbp-request = PKT-LINE("fetch-blob-pack") > 1*want > flush-pkt > want = PKT-LINE("want" SP obj-id) > ---- > > The client may send one or more SHA-1s for which it wants blobs, then a > flush-pkt. > > The server will then reply: > > ---- > server-reply = flush-pkt | PKT-LINE("ERR" SP message) > ---- > > If there was no error, the server will then send them in a packfile, > formatted like described in "Packfile Data" in pack-protocol.txt with > "side-band-64k" enabled. > > Any server that supports "partial clone" will also support this, and the > client will automatically assume this. (How a client discovers "partial > clone" is not covered by this proposal.) > > The server will perform reachability checks on requested blobs through the > equivalent of "git rev-list --use-bitmap-index" (like "git upload-pack" when > using the allowreachablesha1inwant option), unless configured to suppress > reachability checks through a config option. The server administrator is > highly recommended to regularly regenerate the bitmap (or suppress > reachability checks). > > === Endpoint support for forward compatibility > > This "server" endpoint requires that the first line be understood, but will > ignore any other lines starting with words that it does not understand. This > allows new "commands" to be added (distinguished by their first lines) and > existing commands to be "upgraded" with backwards compatibility. > > === Related improvements possible with new endpoint > > Previous protocol upgrade suggestions have had to face the difficulty of > allowing updated clients to discover the server support while not slowing > down (for example, through extra network round-trips) any client, whether > non-updated or updated. The introduction of "partial clone" allows clients > to rely on the guarantee that any server that supports "partial clone" > supports "fetch-blob-pack", and we can extend the guarantee to other > protocol upgrades that such repos would want. > > One such upgrade is "ref-in-want" [3]. The full details can be obtained from > that email thread, but to summarize, the patch set eliminates the need for > the initial ref advertisement and allows communication in ref name globs, > making it much easier for multiple load-balanced servers to serve large > repos to clients - this is something that would greatly benefit the Android > project, for example, and possibly many others. > > Bundling support for "ref-in-want" with "fetch-blob-pack" simplifies matters > for the client in that a client needs to only handle one "version" of server > (a server that supports both). If "ref-in-want" were added later, instead of > now, clients would need to be able to handle two "versions" (one with only > "fetch-blob-pack" and one with both "fetch-blob-pack" and "ref-in-want"). > > As for its implementation, that email thread already contains a patch set > that makes it work with the existing "upload-pack" endpoint; I can update > that patch set to use the proposed "server" endpoint (with a > "fetch-commit-pack" message) if need be. > > == Client behavior > > This proposal is concerned with server behavior only, but it is useful to > envision how the client would use this to ensure that the server behavior is > useful. > > === Indication to use the proposed endpoint > > The client will probably already record that at least one of its remotes > (the one that it successfully performed a "partial clone" from) supports > this new endpoint (if not, it can’t determine whether a missing blob was > caused by repo corruption or by the "partial clone"). This knowledge can be > used both to know that the server supports "fetch-blob-pack" and > "fetch-commit-pack" (for the latter, the client can fall back to > "fetch-pack"/"upload-pack" when fetching from other servers). > > === Multiple remotes > > Fetches of missing blobs should (at least by default?) go to the remote that > sent the tree that points to them. This means that if there are multiple > remotes, the client needs to remember which remote it learned about a given > missing blob from. > > == Alternatives considered > > The "fetch-blob-pack" and "fetch-commit-pack" messages could be split into > their own endpoints. It seemed more reasonable to combine them together > since they serve similar use cases (large repos), and (for example) reduces > the number of binaries in PATH, but I do not feel strongly about this. > > The client could supply commit information about the blobs it wants (or > other information that could help the reachability analysis). However, these > lines wouldn’t be used by the proposed server design. And if we do discover > that these lines are useful, the protocol could be extended with new lines > that contain this information (since old servers will ignore all lines that > they do not understand). > > We could extend "upload-pack" to allow blobs in "want" lines instead of > having a new endpoint. Due to a quirk in the Git implementation (but > possibly not other implementations like JGit), this is already supported > [4]. However, each invocation would require the server to generate an > unnecessary ref list, and would require both the server and the client to > undergo more network traffic. > > Also, the new "server" endpoint might be made to be discovered through > another mechanism (for example, a capability advertisement on another > endpoint). It is probably simpler to tie it to the "partial clone" feature, > though, since they are so likely to be used together. > > [1] <20170304191901.9622-1-markbt@xxxxxxxxxx> > [2] <1488999039-37631-1-git-send-email-git@xxxxxxxxxxxxxxxxx> > [3] <cover.1485381677.git.jonathantanmy@xxxxxxxxxx> > [4] <20170309003547.6930-1-jonathantanmy@xxxxxxxxxx>