# Supporting HTTP remotes in "git archive"
We would like to allow remote archiving from HTTP servers. There are a
few possible implementations to be discussed:
## Shallow clone to temporary repo
This approach builds on existing endpoints. Clients will connect to the
remote server's git-upload-pack service and fetch a shallow clone of the
requested commit into a temporary local repo. The write_archive()
function is then called on the local clone to write out the requested
archive.
### Benefits
* This can be implemented entirely in builtin/archive.c. No new service
endpoints or server code are required.
* The archive is generated and compressed on the client side. This
reduces CPU load on the server (for compressed archives) which would
otherwise be a potential DoS vector.
* This provides a git-native way to archive any HTTP servers that
support the git-upload-pack service; some providers (including GitHub)
do not currently allow the git-upload-archive service.
### Drawbacks
* Archives generated remotely may not be bit-for-bit identical compared
to those generated locally, if the versions of git used on the client
and on the server differ.
* This requires higher bandwidth compared to transferring a compressed
archive generated on the server.
## Use git-upload-archive
This approach requires adding support for the git-upload-archive
endpoint to the HTTP backend. Clients will connect to the remote
server's git-upload-archive service and the server will generate the
archive which is then delivered to the client.
### Benefits
* Matches existing "git archive" behavior for other remotes.
* Requires less bandwidth to send a compressed archive than a shallow
clone.
* Resulting archive does not depend in any way on the client
implementation.
### Drawbacks
* Implementation is more complicated; it will require changes to (at
least) builtin/archive.c, http-backend.c, and
builtin/upload-archive.c.
* Generates more CPU load on the server when compressing archives. This
is potentially a DoS vector.
* Does not allow archiving from servers that don't support the
git-upload-archive service.
## Add a new protocol v2 "archive" command
I am still a bit hazy on the exact details of this approach, please
forgive any inaccuracies (I'm a new contributor and haven't examined
custom v2 commands in much detail yet).
This approach builds off the existing v2 upload-pack endpoint. The
client will issue an archive command (with options to select particular
paths or a tree-ish). The server will generate the archive and deliver
it to the client.
### Benefits
* Requires less bandwidth to send a compressed archive than a shallow
clone.
* Resulting archive does not depend in any way on the client
implementation.
### Drawbacks
* Generates more CPU load on the server when compressing archives. This
is potentially a DoS vector.
* Servers must support the v2 protocol (although the client could
potentially fallback to some other supported remote archive
functionality).
### Unknowns
* I am not clear on the relative complexity of this approach compared to
the others, and would appreciate any guidance offered.
## Summary
Personally, I lean towards the first approach. It could give us an
opportunity to remove server-side complexity; there is no reason that
the shallow-clone approach must be restricted to the HTTP transport, and
we could re-implement other transports using this method. Additionally,
it would allow clients to pull archives from remotes that would not
otherwise support it.
That said, I am happy to work on whichever approach the community deems
most worthwhile.
We would like to allow remote archiving from HTTP servers. There are a
few possible implementations to be discussed:
## Shallow clone to temporary repo
This approach builds on existing endpoints. Clients will connect to the
remote server's git-upload-pack service and fetch a shallow clone of the
requested commit into a temporary local repo. The write_archive()
function is then called on the local clone to write out the requested
archive.
### Benefits
* This can be implemented entirely in builtin/archive.c. No new service
endpoints or server code are required.
* The archive is generated and compressed on the client side. This
reduces CPU load on the server (for compressed archives) which would
otherwise be a potential DoS vector.
* This provides a git-native way to archive any HTTP servers that
support the git-upload-pack service; some providers (including GitHub)
do not currently allow the git-upload-archive service.
### Drawbacks
* Archives generated remotely may not be bit-for-bit identical compared
to those generated locally, if the versions of git used on the client
and on the server differ.
* This requires higher bandwidth compared to transferring a compressed
archive generated on the server.
## Use git-upload-archive
This approach requires adding support for the git-upload-archive
endpoint to the HTTP backend. Clients will connect to the remote
server's git-upload-archive service and the server will generate the
archive which is then delivered to the client.
### Benefits
* Matches existing "git archive" behavior for other remotes.
* Requires less bandwidth to send a compressed archive than a shallow
clone.
* Resulting archive does not depend in any way on the client
implementation.
### Drawbacks
* Implementation is more complicated; it will require changes to (at
least) builtin/archive.c, http-backend.c, and
builtin/upload-archive.c.
* Generates more CPU load on the server when compressing archives. This
is potentially a DoS vector.
* Does not allow archiving from servers that don't support the
git-upload-archive service.
## Add a new protocol v2 "archive" command
I am still a bit hazy on the exact details of this approach, please
forgive any inaccuracies (I'm a new contributor and haven't examined
custom v2 commands in much detail yet).
This approach builds off the existing v2 upload-pack endpoint. The
client will issue an archive command (with options to select particular
paths or a tree-ish). The server will generate the archive and deliver
it to the client.
### Benefits
* Requires less bandwidth to send a compressed archive than a shallow
clone.
* Resulting archive does not depend in any way on the client
implementation.
### Drawbacks
* Generates more CPU load on the server when compressing archives. This
is potentially a DoS vector.
* Servers must support the v2 protocol (although the client could
potentially fallback to some other supported remote archive
functionality).
### Unknowns
* I am not clear on the relative complexity of this approach compared to
the others, and would appreciate any guidance offered.
## Summary
Personally, I lean towards the first approach. It could give us an
opportunity to remove server-side complexity; there is no reason that
the shallow-clone approach must be restricted to the HTTP transport, and
we could re-implement other transports using this method. Additionally,
it would allow clients to pull archives from remotes that would not
otherwise support it.
That said, I am happy to work on whichever approach the community deems
most worthwhile.