Hi Peff, On Tue, 13 Oct 2020, Jeff King wrote: > On Tue, Oct 13, 2020 at 02:55:15PM -0400, Jeff King wrote: > > > But then in fbd76cd450 (sideband: reverse its dependency on pkt-line, > > 2019-01-16), the function became demultiplex_sideband(). The loop went > > away, and we pump only a single packet on each call. When we see > > sideband 2, we do an early return. But on sideband 1, we continue to the > > cleanup: label, which flushes the scratch buffer. > > > > I think we need to return from there without hitting that cleanup label, > > like this: > > By the way, the reason this works is that the "scratch" buffer persists > beyond individual calls to demultiplex_sideband(). So we can get away > with letting it persist between the two. The thing that threw me was that I somehow expected `recv_sideband()` to return primary data as it is read, much like `read()` operates. And yes, I also found the split version of the code (`recv_sideband()` contains the `while` loop and lives in `pkt-line.c` while `demultiplex_sideband()` contains the `scratch` handling and the `switch` between packet types and it lives in `sideband.c`) was much harder to read than the original version. > However unlike the original recv_sideband(), which broke out of the loop > on error and then flushed scratch, our post-fbd76cd450 does not do the > same. It now looks like: > > int recv_sideband(const char *me, int in_stream, int out) > { > char buf[LARGE_PACKET_MAX + 1]; > int len; > struct strbuf scratch = STRBUF_INIT; > enum sideband_type sideband_type; > > while (1) { > len = packet_read(in_stream, NULL, NULL, buf, LARGE_PACKET_MAX, > 0); > if (!demultiplex_sideband(me, buf, len, 0, &scratch, > &sideband_type)) > continue; > switch (sideband_type) { > case SIDEBAND_PRIMARY: > write_or_die(out, buf + 1, len - 1); > break; > default: /* errors: message already written */ > return sideband_type; > } > } > } > > I wondered if we could ever have a case where we broke out of the loop > with data left over in "scratch" (or its buffer left allocated). I think > the answer is no. We only break out of the loop if > demultiplex_sideband() returned non-zero _and_ its not the primary > sideband. So both before and after my suggested fix, we'd have hit the > cleanup label in demultiplex_sideband(), which flushes and frees the > buffer. Right. It took me quite a while to convince myself of that, too. And since I am really worried that the complexity of the code makes it easy to introduce a regression, I spent quite a bit of time to figure out how to implement a good regression test for exactly this issue. Even so, the regression test will not necessarily catch problems where the `while` loop is abandoned without processing any unfinished sideband message. I introduced a `BUG()` for that case, but it is quite a bit unsatisfying that I could not come up with a better way to ensure that this does not happen. > I do have to say that the original loop before that commit was a lot > easier to follow, though. > > Another weirdness I noticed is that it doesn't distinguish a flush > packet (expected) from a zero-byte packet (an error). But neither did > the original. I would have guessed the zero-byte packet was meant to > trigger this second conditional: > > if (len == 0) { > *sideband_type = SIDEBAND_FLUSH; > goto cleanup; > } > if (len < 1) { > strbuf_addf(scratch, > "%s%s: protocol error: no band designator", > scratch->len ? "\n" : "", me); > *sideband_type = SIDEBAND_PROTOCOL_ERROR; > goto cleanup; > } > > but we'd hit the first conditional before then. We would still trigger > the second if we saw EOF while reading the packet (because we set the > length to -1 then), but then it's arguably the wrong error to be > showing. > > So I think this could be improved a bit by using > packet_read_with_status() in the recv_sideband() caller. Possibly. But is it really a bug to send a zero-byte packet? It is inefficient, sure. But I could see how it could potentially simplify code, or serve e.g. as some sort of a "keep-alive" mechanism or whatever. In other words, I am not sure that we should treat this as an error, but yes, we should probably not treat it as a flush (even if it is likely that our current sideband users simply won't ever send empty primary packets). Ciao, Dscho