Re: How to determine when to stop receiving pack content

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 11 Aug 2019 15:38:50 -0700

"Farhan Khan" <farhan@farhan.codes> writes:

> I am trying to write an implementation of git clone over ssh and
> am a little confused how to determine a server response has
> ended. Specifically, after a client sends its requested 'want',
> the server sends the pack content over. However, how does the
> client know to stop reading data? If I run a simple read() of the
> file descriptor:
>
> A. If I use reading blocking, the client will wait until new data is available, potentially forever.
> B. If I use non-blocking, the client might terminate reading for new data, when in reality new data is in transit.

It's TCP stream, so blocking read will tell you when the the other
side finishes talking to you and disconnects.  Your read() will
signal EOF.  If you are paranoid and want to protect your reader
against malicious writer, then you cannot trust anything the other
side says (including possibly any "I have N megabyte of data" kind
of length information), so you'd need to set up a timeout to get
yourself out of a stuck read, but that is neither a news nor a
rocket surgery ;-)

The "upload-pack" (the component that talks with your "fetch" and
"clone"), after negotiating what objects to include in the data
transfer with the program on your side, produces a pack data stream,
and is allowed to send additional "garbage" after that.

The receiving end, after finishing the negotiation, reads the pack
data stream (there is only one packfile contents in it) and parses
it according to the packfile format so that it can find the end
(cf. Documentation/technical/pack-format.txt).

After seeing the end of the pack stream, anything that follows is
"garbage" and is generally passed through to the standard output.

There are two codepaths on the receiving end ("unpack-objects" and
"index-pack --stdin").  Most likely an initial "clone" would end up
following the latter, but for educational purposes, the unpack-objects
may be easier to follow.  These two codepaths are morally equivalent
at the higher conceptual levels.