Re: [PATCH v3 0/4] Additional FAQ entries

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> · Thu, 4 Jul 2024 21:23:28 +0000

On 2024-07-04 at 05:22:27, Junio C Hamano wrote:
> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
> 
> > This series introduces some additional Git FAQ entries on various
> > topics.  They are all things I've seen in my professional life or on
> > Stack Overflow, so I've written documentation.
> >
> > There were some suggestions in the past that the text "modify, tamper
> > with, or buffer" might be somewhat redundant, but I've chosen to keep
> > the text as it is to avoid arguments like, "Well, buffering the entire
> > request or response isn't really modifying it, so Git should just work
> > in that situation," when we already know that doesn't work.
> 
> Buffering the entire thing will break because ...?  Deadlock?  Or is
> there anything more subtle going on?

When we use the smart HTTP protocol, the server sends keep-alive and
status messages as one of the data streams, which is important because
(a) the user is usually impatient and wants to know what's going on and
(b) it may take a long time to pack the data, especially for large
repositories, and sending no data may result in the connection being
dropped or the client being served a 500 by an intermediate layer.  We
know this does happen and I've seen reports of it.

We've also seen some cases where proxies refuse to accept
Transfer-Encoding: chunked (let's party like it's 1999) and send a 411
back since there's no Content-Length header.  That's presumably because
they want to scan the contents for "bad" data all in one chunk, but Git
has to stream the contents unless the data fits in the buffer size.
(This is the one case where http.postBuffer actually makes a
difference.)  I very much doubt that the appliance actually wants to get
a 2 GiB payload to scan, since it probably doesn't have tons of memory
in the first place, but that is what it's asking for.

> Are we affected by any frame boundary (do we even notice?) that
> happens at layer lower than our own pkt-line layer at all (i.e. we
> sent two chunks and we fail to work on them correctly if the network
> collapses them into one chunk, without changing a single byte, just
> changing the number of read() system calls that reads them?)?

No, that's not a problem.  We read four bytes for the pkt-line header,
and then we read the entire body based on that length until we get all
of it.  This is also the way OpenSSL works for TLS packets and is known
to work well.  If the underlying TCP connection provides a partial or
incomplete packet (which can happen due to MTU), we'll just block until
the rest comes in, which is fine.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA
Attachment:
signature.asc

Description: PGP signature