Parallel configure

Zack Weinberg <zackw@xxxxxxxxx> · Thu, 28 Jan 2021 11:20:13 -0500

> On Mon, 25 Jan 2021, Paul Eggert wrote:
> > One other thing could be a significant performance win: if we
> > could use GNU 'make -j' to run most of the guts of the 'configure'
> > script in parallel.  Waiting for 'configure' to finish is
> > something that slows me down a lot; often times 'configure' takes
> > longer than the subsequent 'make', simply because 'configure' is
> > inherently sequential.

I'm pulling this out to its own thread because I think it's a great
idea on a bunch of levels.  It's probably the only way we could speed
up configure scripts by more than a few percentage points.  It would
take a lot of architectural changes, but they are changes that would
be valuable for other reasons -- making dependencies explicit, making
the input configure.ac easier to machine-analyze, that sort of thing.
It could be done without adding any new configure-time dependencies
besides a make with -j support, but it would also help us explore the
possibilities for new tools (since we'd have to go over basically all
of the core code anyway).  And it's a clearly defined project that
would be easy to explain to potential funders.  (In the present state
of the project, I don't think we're getting this done with volunteer
coding hours.)

We also have some experience with internally parallelized shell
scripts, from autotest, although I'd argue that that code should be
scrapped as unworkable.  On my hard drive I have about 2/3 of a
complete rewrite of the autotest main loop in terms of make -j, and
I'd recommend starting from that for configure.

On Mon, Jan 25, 2021 at 5:42 PM Bob Friesenhahn
<bfriesen@xxxxxxxxxxxxxxxxxxx> wrote:
> The challenge here is that a considerable part of configure scripts
> depend on decisions which were made before.  There is no dependency
> information currently in configure scripts.

Yes, but it may not be as bad as you think.  The "check a list of
these things" macros (e.g. AC_CHECK_FUNCS) could be internally
parallelized right now.  Many of the core macros depend only on the
things that they AC_REQUIRE, and produce information that is _usually_
consumed only by AC_OUTPUT.  And we know of any macro whose internal
structure is in the preferred form

AC_DEFUN([AC_CHECK_THING],
[AC_REQUIRE([AC_CHECK_OTHERTHING])]dnl
[AC_CACHE_CHECK([for Thing], [ac_cv_have_thing], [
  code to check for Thing
])
AS_IF([test $ac_cv_have_thing = yes], [
  code to report presence of Thing
])])

that the "code to check for Thing" shouldn't have any side effects.

The challenge I see is, finding a declarative way to specify the dependencies of
configure.ac code like

PKG_PROG_PKG_CONFIG
if $PKG_CONFIG --atleast-pkgconfig-version 0.27; then
  PKG_INSTALLDIR
else
  PKG_INSTALLDIR_COMPAT
fi

and

AC_ARG_ENABLE([hashes],
    AS_HELP_STRING(...),
    [hashes_selected=$enableval],
    [hashes_selected=all]
)
# This code must run after AC_PROG_AWK.
hashes_enabled=`
    $AWK -f ${srcdir}/build-aux/expand-selected-hashes \
         -v SELECTED_HASHES="$hashes_selected" \
            ${srcdir}/lib/hashes.conf
`

(picking on libxcrypt's configure.ac since I wrote it myself ;-)
I don't have any great ideas here but I also haven't thought about it
much yet.

zw