On Thu, Apr 23, 2020 at 02:37:35PM -0700, Jonathan Tan wrote: > Thanks for the reproduction recipe (in [1]) and your analysis. I took a > look, and it's because the check for in_vain is done differently. In v0: > > if (got_continue && MAX_IN_VAIN < in_vain) { > > reflecting the documentation in pack-protocol.txt: > > However, the 256 limit *only* turns on in the canonical client > implementation if we have received at least one "ACK %s continue" > during a prior round. This helps to ensure that at least one common > ancestor is found before we give up entirely. Ah, thanks for that; I hadn't though to look in that file for more clues. > When debugging, I noticed that in_vain was increasing far in excess of > MAX_IN_VAIN, but because got_continue was false, the client did not give > up. > > But in v2: > > if (!haves_added || *in_vain >= MAX_IN_VAIN) { > > ("haves_added" is irrelevant to this discussion. It is another > termination condition - when we have run out of "have"s to send.) > > So there is no check that "continue" was sent. We probably should change > v2 to match v0. I can start writing a patch unless someone else would > like to take a further look at it. Yeah, this fills in the final pieces of the puzzle I was chasing in: https://lore.kernel.org/git/20200422193324.GB558336@xxxxxxxxxxxxxxxxxxxxxxx/ And the patch you suggest sounds like the best solution. I think there's some room for discussion about what the optimal strategies are (e.g., v0 does send a lot more haves than v2 in this instance, and it wouldn't always be helpful). But it makes sense to me to put v2 and v0 on the same footing for now, especially given the regressions people have mentioned, and then we can explore new options at our convenience (like switching on the skipping negotiation algorithm). -Peff