Re: [PATCH v2] docs: drop the version constraints for sphinx and dependencies

Donald Hunter <donald.hunter@xxxxxxxxx> · Tue, 19 Mar 2024 17:59:19 +0000

On Mon, 18 Mar 2024 at 17:10, Donald Hunter <donald.hunter@xxxxxxxxx> wrote:
>
> On Mon, 18 Mar 2024 at 16:54, Vegard Nossum <vegard.nossum@xxxxxxxxxx> wrote:
> >
> > > % time make htmldocs
> > > ...
> > > real  9m0.533s
> > > user  15m38.397s
> > > sys   1m0.907s
> >
> > Was this running 'make cleandocs' (or otherwise removing the output
> > directory) in between? Sphinx is known to be slower if you already have
>
> Yes, times were after 'make cleandocs'.
>
> > an output directory with existing-but-obsolete data, I believe this is
> > the case even when switching from one Sphinx version to another. Akira
> > also wrote about the 7.x performance:
> >
> > https://lore.kernel.org/linux-doc/6e4b66fe-dbb3-4149-ac7e-8ae333d6fc9d@xxxxxxxxx/
>
> Having looked at the Sphinx code, it doesn't surprise me that
> incremental builds can have worse performance. There's probably going
> to be some speedups to be found when we go looking for them.

Following up on this, Symbol.clear_doc(docname) does a linear walk of
symbols which impacts incremental builds. The implementation of
clear_doc() looks broken in other ways which I think would further
worsen the incremental build performance.

Incremental builds also seem to do far more work than I'd expect. A
single modified .rst file is quick to build but a handful of modified
.rst files seems to trigger a far larger rebuild. That would be worth
investigating too.

> > > I have an experimental fix that uses a dict for lookups. With the fix, I
> > > consistently get times in the sub 5 minute range:
> >
> > Fantastic!

I pushed my performance changes to GitHub if you want to try them out:

https://github.com/donaldh/sphinx/tree/c-domain-speedup

I noticed that write performance (the second phase of sphinx-build) is
quite slow and doesn't really benefit from multi processing with -j
nn. It turns out that the bulk of the write work is done in the main
process and only the eventual writing is farmed out to forked
processes. I experimented with pushing more work out to the forked
processes (diff below) and it gives a significant speedup at the cost
of breaking index generation. It might be a  viable enhancement if
indexing can be fixed thru persisting the indices from the
sub-processes and merging them in the main process.

With the below patch, this is the build time I get:

% time make htmldocs SPHINXOPTS=-j12
...
real 1m58.988s
user 9m57.817s
sys 0m49.411s

Note that I get better performance with -j12 than -jauto which auto
detects 24 cores.

diff --git a/sphinx/builders/__init__.py b/sphinx/builders/__init__.py
index 6afb5d4cc44d..6b203799390e 100644
--- a/sphinx/builders/__init__.py
+++ b/sphinx/builders/__init__.py
@@ -581,9 +581,11 @@ class Builder:
                 self.write_doc(docname, doctree)

     def _write_parallel(self, docnames: Sequence[str], nproc: int) -> None:
-        def write_process(docs: list[tuple[str, nodes.document]]) -> None:
+        def write_process(docs: list[str]) -> None:
             self.app.phase = BuildPhase.WRITING
-            for docname, doctree in docs:
+            for docname in docs:
+                doctree = self.env.get_and_resolve_doctree(docname, self)
+                self.write_doc_serialized(docname, doctree)
                 self.write_doc(docname, doctree)

         # warm up caches/compile templates using the first document
@@ -596,6 +598,7 @@ class Builder:

         tasks = ParallelTasks(nproc)
         chunks = make_chunks(docnames, nproc)
+        logger.info(f"_write_parallel: {len(chunks)} chunks")

         # create a status_iterator to step progressbar after writing a document
         # (see: ``on_chunk_done()`` function)
@@ -607,12 +610,7 @@ class Builder:

         self.app.phase = BuildPhase.RESOLVING
         for chunk in chunks:
-            arg = []
-            for docname in chunk:
-                doctree = self.env.get_and_resolve_doctree(docname, self)
-                self.write_doc_serialized(docname, doctree)
-                arg.append((docname, doctree))
-            tasks.add_task(write_process, arg, on_chunk_done)
+            tasks.add_task(write_process, chunk, on_chunk_done)

         # make sure all threads have finished
         tasks.join()