(just cc-ing René Scharfe, archive expert; Peff; Dscho; Franck Bui-Huu to see how his creation is evolving) Josh Steadmon wrote: > # Supporting HTTP remotes in "git archive" > > We would like to allow remote archiving from HTTP servers. There are a > few possible implementations to be discussed: > > ## Shallow clone to temporary repo > > This approach builds on existing endpoints. Clients will connect to the > remote server's git-upload-pack service and fetch a shallow clone of the > requested commit into a temporary local repo. The write_archive() > function is then called on the local clone to write out the requested > archive. > > ### Benefits > > * This can be implemented entirely in builtin/archive.c. No new service > endpoints or server code are required. > > * The archive is generated and compressed on the client side. This > reduces CPU load on the server (for compressed archives) which would > otherwise be a potential DoS vector. > > * This provides a git-native way to archive any HTTP servers that > support the git-upload-pack service; some providers (including GitHub) > do not currently allow the git-upload-archive service. > > ### Drawbacks > > * Archives generated remotely may not be bit-for-bit identical compared > to those generated locally, if the versions of git used on the client > and on the server differ. > > * This requires higher bandwidth compared to transferring a compressed > archive generated on the server. > > > ## Use git-upload-archive > > This approach requires adding support for the git-upload-archive > endpoint to the HTTP backend. Clients will connect to the remote > server's git-upload-archive service and the server will generate the > archive which is then delivered to the client. > > ### Benefits > > * Matches existing "git archive" behavior for other remotes. > > * Requires less bandwidth to send a compressed archive than a shallow > clone. > > * Resulting archive does not depend in any way on the client > implementation. > > ### Drawbacks > > * Implementation is more complicated; it will require changes to (at > least) builtin/archive.c, http-backend.c, and > builtin/upload-archive.c. > > * Generates more CPU load on the server when compressing archives. This > is potentially a DoS vector. > > * Does not allow archiving from servers that don't support the > git-upload-archive service. > > > ## Add a new protocol v2 "archive" command > > I am still a bit hazy on the exact details of this approach, please > forgive any inaccuracies (I'm a new contributor and haven't examined > custom v2 commands in much detail yet). > > This approach builds off the existing v2 upload-pack endpoint. The > client will issue an archive command (with options to select particular > paths or a tree-ish). The server will generate the archive and deliver > it to the client. > > ### Benefits > > * Requires less bandwidth to send a compressed archive than a shallow > clone. > > * Resulting archive does not depend in any way on the client > implementation. > > ### Drawbacks > > * Generates more CPU load on the server when compressing archives. This > is potentially a DoS vector. > > * Servers must support the v2 protocol (although the client could > potentially fallback to some other supported remote archive > functionality). > > ### Unknowns > > * I am not clear on the relative complexity of this approach compared to > the others, and would appreciate any guidance offered. > > > ## Summary > > Personally, I lean towards the first approach. It could give us an > opportunity to remove server-side complexity; there is no reason that > the shallow-clone approach must be restricted to the HTTP transport, and > we could re-implement other transports using this method. Additionally, > it would allow clients to pull archives from remotes that would not > otherwise support it. > > That said, I am happy to work on whichever approach the community deems > most worthwhile.