Re: Proposed approaches to supporting HTTP remotes in "git archive"

René Scharfe <l.s.r@xxxxxx> · Sun, 29 Jul 2018 13:54:37 +0200

Am 28.07.2018 um 00:32 schrieb Junio C Hamano:
> Josh Steadmon <steadmon@xxxxxxxxxx> writes:
> 
>> # Supporting HTTP remotes in "git archive"
>>
>> We would like to allow remote archiving from HTTP servers. There are a
>> few possible implementations to be discussed:
>>
>> ## Shallow clone to temporary repo
>>
>> This approach builds on existing endpoints. Clients will connect to the
>> remote server's git-upload-pack service and fetch a shallow clone of the
>> requested commit into a temporary local repo. The write_archive()
>> function is then called on the local clone to write out the requested
>> archive.

A prototype would require just a few lines of shell script, I guess..

A downside that was only stated implicitly: This method needs temporary
disk space for the clone, while the existing archive modes only ever
write out the resulting file.  I guess the required space is in the same
order as the compressed archive.  This shouldn't be a problem if we
assume the user would eventually want to extract its contents, right?

>> ## Summary
>>
>> Personally, I lean towards the first approach. It could give us an
>> opportunity to remove server-side complexity; there is no reason that
>> the shallow-clone approach must be restricted to the HTTP transport, and
>> we could re-implement other transports using this method.  Additionally,
>> it would allow clients to pull archives from remotes that would not
>> otherwise support it.
> 
> I consider the first one (i.e. make a shallow clone and tar it up
> locally) a hack that does *not* belong to "git archive --remote"
> command, especially when it is only done to "http remotes".  The
> only reason HTTP remotes are special is because there is no ready
> "http-backend" equivalent that passes the "git archive" traffic over
> smart-http transport, unlike the one that exists for "git
> upload-pack".
> 
> It however still _is_ attractive to drive such a hack from "git
> archive" at the UI level, as the end users do not care how ugly the
> hack is ;-)  As you mentioned, the approach would work for any
> transport that allows one-commit shallow clone, so it might become
> more palatable if it is designed as a different _mode_ of operation
> of "git archive" that is orthogonal to the underlying transport,
> i.e.
> 
>    $ git archive --remote=<repo> --shallow-clone-then-local-archive-hack master
> 
> or
> 
>    $ git config archive.<repo>.useShallowCloneThenLocalArchiveHack true
>    $ git archive --remote=<repo> master

Archive-via-clone would also work with full clones (if shallow ones are
not available), but that would be wasteful and a bit cruel, of course.

Anyway, I think we should find a better (shorter) name for that option;
that could turn out to be the hardest part. :)

> It might turn out that it may work better than the native "git
> archive" access against servers that offer both shallow clone
> and native archive access.  I doubt a single-commit shallow clone
> would benefit from reusing of premade deltas and compressed bases
> streamed straight out of packfiles from the server side that much,
> but you'd never know until you measure ;-)

It could benefit from GIT_ALTERNATE_OBJECT_DIRECTORIES, but I guess
typical users of git archive --remote won't have any good ones lying
around.

René