Re: [PATCH 4/5] t6022, t6046: test expected behavior instead of testing a proxy for it

SZEDER Gábor <szeder.dev@xxxxxxxxx> · Fri, 13 Mar 2020 18:17:39 +0100

On Thu, Mar 12, 2020 at 09:01:57PM +0100, SZEDER Gábor wrote:
> > > > I saw this test fail today in one of my custom CI builds:
> > > >
> > > >   +git checkout change
> > > >   Switched to branch 'change'
> > > >   +test-tool chmtime =-1 M
> > > >   +test-tool chmtime --get M
> > > >   +GIT_MERGE_VERBOSITY=3 git merge change+rename
> > > >   +test-tool chmtime --get B
> > > >   +cat old-mtime
> > > >   +cat new-mtime
> > > >   +test 1583967731 -lt 1583967731
> > > >   error: last command exited with $?=1
> > > >   not ok 12 - merge of identical changes in a renamed file
> > > >
> > > > The contents of 'out', i.e. the output of the 'git merge' command
> > > > before the failure is:
> > > >
> > > >   Auto-merging B
> > > >   Merge made by the 'recursive' strategy.
> > > >    A => B | 0
> > > >    1 file changed, 0 insertions(+), 0 deletions(-)
> > > >    rename A => B (100%)
> > > >
> > > > This is a rare failure, this is the first time I saw it, and to make
> > > > things worse, this one time it happened on big-endian and with all the
> > > > GIT_TEST_* knobs enabled.
> > > >
> > > >   https://travis-ci.org/github/szeder/git-cooking-topics-for-travis-ci/jobs/661294571#L4020
> > >
> > > This is very troubling.  The workflow is basically:
> > >   - Manually set the mtime to a file to something old (I happened to
> > > pick 1 second before now, but picking something from 1970 would have
> > > been fine too).
> > >   - Run a merge which is known to need to overwrite the file.  Your
> > > output ("Auto-merging B") suggests that we should have been in such a
> > > case.
> > >   - Verify that the file was actually updated as expected.  Since the
> > > OS is supposed to update the mtime when it writes the file, it should
> > > have set it to something recent, i.e. something *different* than what
> > > it had before.
> > >
> > > So, now I'm left wondering how the mtime possibly could have been not
> > > updated.  Maybe the file wasn't actually written?  (But if so, why
> > > didn't other people see the failure?  Or your stress runs not see it?)
> > >  Or maybe it was written but all file contents and metadata were
> > > delayed in writing to disk such that a subsequent command still sees
> > > the old file??  Or maybe it was written but the mtime update was
> > > delayed and the test was able to check it in that intermediate
> > > state???  Or perhaps the mtime check before the merge raced with the
> > > setting of the mtime backwards and got the mtime before it was
> > > rewound????
> > >
> > > I don't have a plausible scenario under which any of these should be
> > > possible; I'm at a loss.
> > 
> > Hmm.  Maybe leap seconds, or clock updates via ntp at an unfortunate
> > time?
> 
> I'm now fairly confident that 'git merge' is OK, and suspect that it's
> an issue with Travis CI's s390x environment (multi-architecture
> support is an alpha-stage feature).

I could finally reproduce the issue on my own machine, so apparently
it's not an issue in Travis CI's multi-arch environments.

>   test_expect_success 'test' '
>   	touch file &&
>   	old=$(test-tool chmtime --get =-1 file) &&
>   	touch file &&
>   	new=$(test-tool chmtime --get file) &&
>   	test $old -lt $new
>   '

I wanted to report the issue to Travis CI, and in order to do so I
turned the above test case into a script that doesn't at all depend on
out test framework and 'test-tool' but uses only coreutils commands
(with a whole lot of GNUisms...):

  cat >timestamp.sh <<-\EOF
  #!/bin/sh

  set -ex

  i=0
  while true
  do
  	printf "$i\r"
  	i=$((i + 1))

  	# set a file's mtime to one second ago
  	now=$(date "+%s.%N")
  	one_sec_ago=$(date -d "@$((${now%.*} - 1))" "+%Y%m%d%H%M%S")
  	touch -t ${one_sec_ago%??}.${one_sec_ago#????????????} file
  	# save its actual mtime
  	old=$(date -r file "+%s.%N")
  	# set its mtime to now
  	touch file
  	# the current the mtime should be different, but sometimes it isn't
  	new=$(date -r file "+%s.%N")
  	test "${old%.*}" != "${new%.*}"
  done
  EOF
  chmod u+x timestamp.sh
  ./timestamp.sh 2>out
  echo
  tail -n11 out

And this script usually fails after a few hundred iterations on my
machine:

  308
  + date +%s.%N
  + now=1584118408.002458987
  + date -d @1584118407 +%Y%m%d%H%M%S
  + one_sec_ago=20200313175327
  + touch -t 202003131753.27 file
  + date -r file +%s.%N
  + old=1584118407.000000000
  + touch file
  + date -r file +%s.%N
  + new=1584118407.997464837
  + test 1584118407 != 1584118407

Note that the mtime update at the end of the iteration results in an
mtime that is _before_ the current time at the beginning of the
iteration.  Well, I'm puzzled :)

> > Perhaps just setting the "old" time to something more than one
> > second in the past would avoid this?

Yes, setting the old timestamp 2 seconds in the past seems to be
sufficient, at least I haven't seen it fail in a few 100k repetitions.