Re: [PATCHv2 0/10] pkt-line and remote-curl cleanups server

Shawn Pearce <spearce@xxxxxxxxxxx> · Tue, 19 Feb 2013 23:14:45 -0800

On Mon, Feb 18, 2013 at 1:49 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Jeff King <peff@xxxxxxxx> writes:
>
>> On Mon, Feb 18, 2013 at 01:29:16AM -0800, Junio C Hamano wrote:
>>
>>> > I just checked, and GitHub also does not send flush packets after ERR.
>>> > Which makes sense; ERR is supposed to end the conversation.
>>>
>>> Hmph.  A flush packet was supposed to be a mark to say "all the
>>> packets before this one can be buffered and kept without getting
>>> passed to write(2), but this and all such buffered data _must_ go on
>>> the wire _now_".  So in the sense, ERR not followed by a flush may
>>> not even have a chance to be seen on the other end, no?  That is
>>> what the comment before the implementation of packet_flush() is all
>>> about.
>>
>> Despite the name, I always thought of packet_flush as more of a signal
>> for the _receiver_, who then knows that they have gotten all of a
>> particular list. In other words, we seem to use it as a sequence point
>> as much as anything (mostly because we immediately write() out any other
>> packets anyway, so there is no flushing to be done; but perhaps there
>> were originally thoughts that we could do more buffering on the writing
>> side).
>
> Exactly.

This is also my understanding of the flush-pkt ("0000"). Its an
end-of-list/end-of-section marker to let the peer know the protocol is
moving to the next state. Except where the protocol can move to the
next state without a flush-pkt of course (see below).

> The logic behind the comment "we do not buffer, but we should" is
> that flush tells the receiver that the sending end is "done" feeding
> it a class of data, and the data before flush do not have to reach
> the receiver immediately, hence we can afford to buffer on the
> sending end if we can measure that doing so would give us better
> performance. We haven't measure and turned buffering on yet.

So funny story. JGit actually buffers the pkt-line writes in memory
and does flushes to the network socket when any of the following
happen:

- fixed size in-memory buffer is full (8k or 32k by default)
- flush-pkt is needed in the protocol
- JGit forces a flush without a flush-pkt

There are places in the protocol where data needs to be shared with
the peer *despite* not having a flush-pkt present in the data stream
to do that. ERR is one of these places. "done\n" at the end of the
negotiation in a client is another. Sending ACK/NAK in a multi_ack
from upload-pack is another. I may be missing more.

JGit had to define three methods to make the pkt-line protocol work correctly:

- writeString: format a string as a single pkt-line, insert into buffer.
- end: write "000" into the buffer, flush the buffer.
- flush: flush the buffer if it has any content.

I always found that comment in front of that function funny. Its
totally not true. Fixing it is harder than just stuffing a buffer in
there and hoping for the best. The callers need work. At which point
that function isn't really what its author was trying to do.

> But when dying, we may of course have data before flushing. We may
> disconnect (by dying) without showing flush (or preceding ERR) to
> the other side, and for that reason, not relying on the flush packet
> on the receiving end is a good change. Once we turn buffering on, we
> probably need to be careful when sending an ERR indicator by making
> it always drain any buffered data and show the ERR indicator without
> buffering, or something.

Yes.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html