Jacob Bachmeyer <jcb62281@xxxxxxxxx>, Sat Apr 01 2023 04:54:22
GMT+0200 (Central European Summer Time)
A quick introduction to the situation for the Autoconf list:
The Automake maintainers have encountered a bizarre issue with
sporadic random test failures, seemingly due to "disk writes not
taking effect" (as Karl Berry mentioned when starting the thread).
Bogdan appears to have traced the issue to autom4te caching and
offered a patch. I have attached a copy of Bogdan's patch.
Bogdan's patch is a subtle change: the cache is now considered stale
unless it is /newer/ than the source files, rather than being
considered stale only if the source files are newer. In short, this
patch causes the cache to be considered stale if its timestamp
/matches/ the source file, while it is currently considered valid if
the timestamps match. I am forwarding the patch to the Autoconf list
now because I concur with the change, noting that Time:HiRes is also
limited by the underlying filesystem and therefore is not a "magic
bullet" solution. Assuming the cache files are stale unless proven
otherwise is therefore correct.
Thank you :)
Note again that this is _Bogdan's_ patch I am forwarding unchanged. I
did not write it (but I agree with it).
[further comments inline below]
Bogdan wrote:
Bogdan <bogdro_rep@xxxxxx>, Sun Mar 05 2023 22:31:55 GMT+0100
(Central European Standard Time)
Karl Berry <karl@xxxxxxxxxxxxxxx>, Sat Mar 04 2023 00:00:56
GMT+0100 (Central European Standard Time)
Note that 'config.h' is older (4 seconds) than './configure',
which
shouldn't be the case as it should get updated with new values.
Indeed. That is the same sort of thing as I was observing with nodef.
But what (at any level) could be causing that to happen?
Files just aren't getting updated as they should be.
I haven't yet tried older releases of automake to see if their tests
succeed on the systems that are failing now. That's next on my list.
[...]
Another tip, maybe: cache again. When I compare which files are
newer than the only trace file I get in the failing 'backcompat2'
test ('autom4te.cache/traces.0'), I see that 'configure.ac' is
older than this file in the succeeding run, but it's newer in the
failing run. This could explain why 'configure' doesn't get updated
to put new values in config.h (in my case) - 'autom4te' thinks it's
up-to-date.
The root cause may be in 'autom4te', sub 'up_to_date':
# The youngest of the cache files must be older than the oldest of
# the dependencies.
# FIXME: These timestamps have only 1-second resolution.
# Time::HiRes fixes this, but assumes Perl 5.8 or later.
(lines 913-916 in my version).
This comment Bogdan cites is not correct: Time::HiRes could be
installed from CPAN on Perls older than 5.8, and might be missing from
a 5.8 or later installation if the distribution packager separated it
into another package. Nor is Time::HiRes guaranteed to fix the issue;
the infamous example is the FAT filesystem, where timestamps only have
2-second resolution. Either way, Time::HiRes is now used if
available, so this "FIXME" is fixed now. :-)
Good to hear :).
I didn't comment on the comment itself ;). Time::HiRes could have
been installed on Perl < 5.8, but since then it was in the core
modules, right? So, it *should* work for users by default then, and
Autoconf wouldn't require additional installations. That was the core
message of the comment, I think.
Perhaps 'configure.ac' in the case that fails is created "not
late enough" (still within 1 second) when compared to the cache,
and the cached values are taken, generating the old version of
'configure' which, in turn, generates old versions of the output
files.
Still a guess, but maybe a bit more probable now.
Does it work when you add '-f' to '$AUTOCONF'? It does for me -
again, about 20 sequential runs of the same set of tests and about
5 parallel with 4 threads. Zero failures.
I'd probably get the same result if I did a 'rm -fr
autom4te.cache' before each '$AUTOCONF' invocation.
[...]
More input (or noise):
1) The t/backcompat2.sh test (the only test which fails for me) is a
test which modifies configure.ac and calls $AUTOCONF several times.
2) Autom4te (part of Autoconf) has a 1-second resolution in checking
if the input files are newer than the cache.
Maybe. That comment could be wrong; the actual "sub mtime" is in
Autom4te::FileUtils. Does your version of that module use
Time::HiRes? Git indicates that use of Time::HiRes was added to
Autoconf at commit 3a9802d60156809c139e9b4620bf04917e143ee2 which is
between the 2.72a and 2.72c snapshot tags.
I'm using Autoconf provided by my system and it's version 2.71
(official package, I assume). Autom4te::FileUtils is using the
built-in stat() function.
3) Thus, a sequence: 'autoconf' + quickly modify configure.ac +
quickly run 'autoconf' may cause autom4te to use the old values from
the cache instead of processing the new configure.ac. "Quickly"
means within the same second.
It might be broader than that if your version is already using
Time::HiRes.
Time::HiRes isn't used, but you may be right.
If so, what filesystems are involved?
Ext4 here.
I could see a
possible bug where multiple writes get the same mtime if they get
flushed to disk together.
Perhaps. Depends on when the OS updates the mtime. If on write (even
if it goes just to the disk cache) - should work. If on physical
flush, you have a point. It may depend on the OS, the filesystem type,
and maybe on other factors as well. It's better to be safe :)
Time::HiRes will not help if this happens;
your patch will work around such a bug.
4) I ran the provided list of tests (t/backcompat2.sh,
t/backcompat3.sh, t/get-sysconf.sh, t/lex-depend.sh, t/nodef.sh,
t/remake-aclocal-version-mismatch.sh, t/subdir-add2-pr46.sh,
t/testsuite-summary-reference-log.sh) in batches of 20 or more runs.
5) With the tools as they are on my system, I got a failure in the
t/backcompat2.sh test in the first batch (18th round, IIRC).
6) I modified my autom4te using the attached patch, which
essentially makes the mentioned sub 'up_to_date' work as if the
cache is out of date if its modtime (up to 1-second precision) is
not only earlier, but also equal to the modtime of any dependencies
(including configure.ac).
7) After modifying autom4te, I ran 120 rounds of the same set of
tests in single-threaded mode, and additional 120 rounds in parallel
mode (-j4). Total of 240 runs of all those 8 mentioned tests each.
ZERO FAILURES.
8) I brought autom4te to its original state and started running the
tests again. I got the first failure quite early (32nd run, IIRC).
[...]
What can we do about this?
- have autom4te patched and wait for the fix to reach a release (and
get installed on every possible end-user system?), and revert the
sleep until this is done,
Now is a good time to submit the patch to Autoconf, noting that it
seems to resolve a hard-to-find issue with the Automake testsuite,
since Autoconf has recently issued a snapshot and then picked up a
change I helped get into Automake to make better use of Perl's
Time::HiRes, after use of Time::HiRes was upstreamed from Autoconf.
OK, so let's see how this goes. Of course, I just did the patch the
simple way, so use 'patch -p2', manual edit, or whatever you wish to
apply.
It still would need Automake tests to use the newest (patched)
Autoconf/autom4te, so test failures may still occur on systems without
the patched version, but I guess it's "only" tests and we (or at least
I) could live with that. We won't be waiting for all ancient
Linux/Unix to be updated to make just the test suite pass if we know
the root cause.
Thank you!
Bogdan Drozdowski
--
Regards - Bogdan ('bogdro') D. (GNU/Linux & FreeDOS)
X86 assembly (DOS, GNU/Linux): http://bogdro.evai.pl/index-en.php
Soft(EN): http://bogdro.evai.pl/soft http://bogdro.evai.pl/soft4asm
www.Xiph.org www.TorProject.org www.LibreOffice.org www.GnuPG.org