On Tue, Nov 01 2022, Jeff King wrote: > On Tue, Nov 01, 2022 at 09:15:00AM +0100, Ævar Arnfjörð Bjarmason wrote: > >> > So I really didn't revisit this commit much at all, and was just trying >> > to save Dscho (or Taylor) the work of having to rebase it, if we go with >> > my patch 1. >> > >> > IMHO it is OK enough as it is, but if I were writing it from scratch I >> > probably would have given the rationale that the tests are insiting on a >> > dumb, sub-optimal behavior. And flakes or inconsistencies aside, they >> > should be asserting only the presence or absence of the message. And >> > probably would have left each at "grep" and dropped the test_line_count >> > totally. >> >> Do you mean that even if we fix the bug and consistently emit one and >> only one such message you'd like to have the tests not assert that >> that's the case? > > No, I wouldn't mind it, if that is a bug we've fixed. I just mean that > the tests as written never wanted to say "3 is the absolute right number > of times to write this message". They only put "3" there because it made > things pass. That's one reason, another is to assert current behavior, and to be able to answer questions like "does this patch change behavior" without having to recursively diff the trash directory output of a test run, because everything's so fuzzy. If and when it's 1 instead of 3, great, adjusting the test isn't a big deal. Anyway, we're off into general testing philosophy again, which I think is off topic here. >> I do think that UX is important enough to test for, particularly if >> we've had a bug related to that that we've fixed. I.e. if something in >> the direction of my [1] goes in. > > Sure, I don't mind at all a test for it. In the short-term, if you want > a test that fails, I'd prefer it be separate so that we can test the > useful existing behavior that _does_ work. If the multiple-messages bug > is fixed, I don't mind folding them together into a single test that > passes. Right, I'm not saying "keep the flaky test", I'm saying let's keep the ones we know aren't flaky. >> > It is not even clear to me that the remote-https is the one being >> > swallowed (at least, I have not seen an argument or evidence that this >> > is so; it does seem plausible). >> >> It is the case, the only ones that are going to be duplicated are the >> "warn" ones, because for "die" we'll die right away in the parent >> process. > > Right, I understand why "die" produces only one. My question was when we > produce 2 on Windows (sometimes?) but 3 elsewhere, are we sure it is the > one from remote-https that is eaten, or could it ever be one of the > others? I don't have a test case in front of me, and Johannes didn't provide one (or even a link to CI output). But from his description and being familiar with the code I'm pretty certain isn't not the "die" cases, those are all in-process, and it happens before we spawn sub-processes, I don't see how that would be different on Windows. >> > I thought the point is that the outer program calling the helper would >> > consistently produce the error, always yielding at least one instance. >> > The helper one is generally "extra" and undesired. >> >> Yes, exactly. Which is what my fix[1] the root cause addresses. >> >> Anyway, I'm just trying to help here. If you/Johannes/others want to go >> with the "hotfix" as-is that's fine my me. >> >> I just don't see what the hurry is, it's been this way for two releases, >> if it's flaky that's been the case for months, I'd think we could just >> fix the root cause. > > It recently bit me twice, so maybe I am giving it more urgency than it > deserves (or maybe something changed in CI to make it more likely). Bit you in GitHub Windows CI? > I do think it would be nice to fix it. I don't love your patch for the > reasons I replied there (not your fault; it's inherently a crappy and > complicated problem). In the meantime, I'd like to see CI fixed, as > it is wasting developer's time. And that's why I called Dscho's > loosening "good enough". It is hopefully a temporary state anyway. > > But I would be just as happy to see a similar patch which just changed > the 2/3 lines to "-ge 1" (or just a straight grep). Sure, if we're deciding not to care about tests that are unrelated to the flakyness problem at hand being loosened.