On Thu, Nov 21, 2019 at 09:09:37AM +0100, Rasmus Villemoes wrote: > On 21/11/2019 01.03, Kees Cook wrote: > > diff --git a/Documentation/sphinx/parallel-wrapper.sh b/Documentation/sphinx/parallel-wrapper.sh > > new file mode 100644 > > index 000000000000..a416dbfd2025 > > --- /dev/null > > +++ b/Documentation/sphinx/parallel-wrapper.sh > > @@ -0,0 +1,25 @@ > > +#!/bin/sh > > +# SPDX-License-Identifier: GPL-2.0+ > > +# > > +# Figure out if we should follow a specific parallelism from the make > > +# environment (as exported by scripts/jobserver-exec), or fall back to > > +# the "auto" parallelism when "-jN" is not specified at the top-level > > +# "make" invocation. > > + > > +sphinx="$1" > > +shift || true > > + > > +parallel="${PARALLELISM:-1}" > > +if [ ${parallel} -eq 1 ] ; then > > + auto=$(perl -e 'open IN,"'"$sphinx"' --version 2>&1 |"; > > + while (<IN>) { > > + if (m/([\d\.]+)/) { > > + print "auto" if ($1 >= "1.7") > > + } > > + } > > + close IN') > > + if [ -n "$auto" ] ; then > > + parallel="$auto" > > + fi > > +fi > > +exec "$sphinx" "-j$parallel" "$@" > > I don't understand this logic. If the parent failed to claim any tokens > (likely because the top make and its descendants are already running 16 > gcc processes), just let sphinx run #cpus jobs in parallel? That doesn't > make sense - it gets us back to the "we've now effectively injected K > tokens to the jobserver that weren't there originally". I was going to say "but jobserver-exec can't be running unless there are available slots", but I see the case is "if there are 16 slots and jobserver-exec gets _1_, it should not fall back to 'auto'". > From the comment above, what you want is to use "auto" if the top > invocation was simply "make docs". Well, I kind of disagree with falling > back to auto in that case; the user can say "make -j8 docs" and the > wrapper is guaranteed to claim them all. But if you really want, the > jobserver-count script needs to detect and export the "no parallelism > requested at top level" in some way distinct from "PARALLELISM=1", > because that's ambiguous. Right -- failure needs to be be distinct from "only 1 available". > > + # Read out as many jobserver slots as possible. > > + while True: > > + try: > > + slot = os.read(reader, 1) > > + jobs += slot > > I'd just try to slurp in 8 or 16 tokens at a time, there's no reason to > limit to 1 in each loop. Good point. I will change that. > > +rc = subprocess.call(sys.argv[1:]) > > + > > +# Return all the actually reserved slots. > > +if len(jobs): > > + os.write(writer, jobs) > > + > > +sys.exit(rc) > > What happens if the child dies from a signal? Will this correctly > forward that information? As far as I understand, yes, signal codes are passed through via the exit code (i.e. see WIFSIGNALED, etc). > Similarly (and the harder problem), what happens when our parent wants > to send its child a signal to say "stop what you're doing, return the > tokens, brush your teeth and go to bed". We should forward that signal > to the real job instead of just dying, losing track of both the tokens > we've claimed as well as orphaning the child. Hm, hm. I guess I could pass INT and TERM to the child. That seems like the most sensible best-effort here. It seems "make" isn't only looking at the slots to determine process management. -- Kees Cook