On Tue, Aug 27, 2019 at 06:43:29PM -0700, Derrick Stolee via GitGitGadget wrote: > Test t5516-fetch-push.sh has a test 'deny fetch unreachable SHA1, > allowtipsha1inwant=true' that checks stderr for a specific error > string from the remote. In some build environments the error sent > over the remote connection gets mingled with the error from the > die() statement. Since both signals are being output to the same > file descriptor (but from parent and child processes), the output > we are matching with grep gets split. In the spirit of "An error message is worth a thousand words", I think it's worth to include the error message causing the failure: error: 'grep not our ref.*64ea4c133d59fa98e86a771eda009872d6ab2886 err' didn't find a match in: fatal: git upload-pack: not our ref 64ea4c13fatal: remote error: upload-pack: not our ref 63d59fa98e86a771eda009872d6ab2886 4ea4c133d59fa98e86a771eda009872d6ab2886 error: last command exited with $?=1 I tried to reproduce this specific error on Linux and macOS, but couldn't yet. > To reduce the risk of this failure, follow this process instead: Here you talk about reducing the risk ... > 1. Write an error message to stderr. > 2. Write an error message across the connection. > 3. exit(1). > > This reorders the events so the error is written entirely before > the client receives a message from the remote, removing the race > condition. ... but here you talk about removing the race condition. I don't understand how this change would remove the race condition. After all, the occasional failure is caused by two messages racing through different file descriptors, and merely sending them in reverse order doesn't change that they would still be racing. > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > upload-pack.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/upload-pack.c b/upload-pack.c > index 222cd3ad89..b0d3e028d1 100644 > --- a/upload-pack.c > +++ b/upload-pack.c > @@ -613,11 +613,12 @@ static void check_non_tip(struct object_array *want_obj, > for (i = 0; i < want_obj->nr; i++) { > struct object *o = want_obj->objects[i].item; > if (!is_our_ref(o)) { > + warning("git upload-pack: not our ref %s", > + oid_to_hex(&o->oid)); > packet_writer_error(writer, > "upload-pack: not our ref %s", > oid_to_hex(&o->oid)); > - die("git upload-pack: not our ref %s", > - oid_to_hex(&o->oid)); > + exit(1); So, the error coming from the 'git fetch' command in question currently looks like this: fatal: git upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886 fatal: remote error: upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886 Changing die() to a warning() changes the prefix of the message from "fatal:" to "warning:", i.e. with this patch I got this: warning: git upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886 fatal: remote error: upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886 I don't think that "demoting" that message from fatal to warning matters much to users, because they would see the (arguably redundant) second line starting with "fatal:". As for the problematic test, it checks this error with: test_i18ngrep "remote error:.*not our ref.*$SHA1_3\$" err so changing that prefix shouldn't affect the test, either. Unfortunately, however, while running './t5516-fetch-push.sh -r 1,79 --stress' to try to reproduce a failure caused by those mingled messages, the same check only failed for a different reason so far (both on Linux and macOS (on Travis CI)): error: 'grep remote error:.*not our ref.*64ea4c133d59fa98e86a771eda009872d6ab2886$ err' didn't find a match in: fatal: git upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886 fatal: unable to write to remote: Broken pipe error: last command exited with $?=1 And with this patch: error: 'grep remote error:.*not our ref.*64ea4c133d59fa98e86a771eda009872d6ab2886$ err' didn't find a match in: warning: git upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886 fatal: unable to write to remote: Broken pipe error: last command exited with $?=1 We could make the test pass by relaxing the 'grep' pattern to look only for 'not our ref.*<SHA...>', but I doubt that ignoring a broken pipe is a such good idea.