Re: Parallelization of shell scripts for 'configure' etc.

Alex Ameen <alex.ameen.tx@xxxxxxxxx> · Thu, 16 Jun 2022 21:44:27 -0500

Python and Perl carry a massive dependency closure; notably this closure
depends on `autoconf` itself, so "using Python or Perl in `autoconf`"
creates a large, very large, bootstrap paradox; BUT projects that aren't
members of the Perl/Python closure could take advantage of those tools.

We had an issue with a bootstrap paradox like this in `libtool` recently
with the `file(1) ` command that caused issues. Not irreconcilable, but
distro maintainers were understandably concerned about the impact this had
on reproducibility.

On Thu, Jun 16, 2022, 6:08 PM Demi Marie Obenour <demiobenour@xxxxxxxxx>
wrote:

> On 6/14/22 16:36, Richard Purdie wrote:
> > On Tue, 2022-06-14 at 13:11 -0400, Nick Bowler wrote:
> >> The resulting config.h is correct but pa.sh took almost 1 minute to run
> >> the configure script, about ten times longer than dash takes to run the
> >> same script.  More than half of that time appears to be spent just
> >> loading the program into pa.sh, before a single shell command is
> >> actually executed.
> >
> > Thanks for sharing that, it saves me looking into it!
> >
> > I work on a cross compiling build environment (Yocto Project) and we
> > find that a large percentage of our build times (20%?) are in the
> > configure stage, either running autoreconf or configure with a 50/50
> > split between the two. We autoreconf since we change the macros in some
> > cases, e.g. libtool.
> >
> > I would love to find a way to be more efficient about this part of our
> > builds. We do already provide some cached values for some macros to try
> > and be a little more efficient.
> >
> > When I've profiled things, most of the time seems to be "fork" overhead
> > of builds having to fork new processes to run shell command pipelines.
> > I have sometimes wondered if we couldn't make code which was more
> > optimised to the common case and didn't have so much forking going on.
>
> I wonder if one could implement a shell that only created a new
> process when it absolutely had to, and which implemented many of the
> common text processing tools as builtin commands.  Subshells would
> be implemented via user-level copy-on-write, rather than relying on
> OS support for fork().
>
> Another approach would be to generate Python or Perl scripts
> in addition to shell scripts, allowing the use of the respective
> interpreters when available.  In my experience that is basically all
> the time.
>
> Finally, a small but probably noticable improvement would come
> from dropping support for ancient platforms, such as Ultrix.  A much
> bigger win would be to use Bash or Zsh if they are installed, as that
> allows using modern shell tricks (such as [[ "$a" =~ [0-9]+ ]] and
> "${a//a/b}") that do not require forking new processes.
> --
> Sincerely,
> Demi Marie Obenour (she/her/hers)