On Mon, 18 Mar 2024 at 17:10, Donald Hunter <donald.hunter@xxxxxxxxx> wrote: > > On Mon, 18 Mar 2024 at 16:54, Vegard Nossum <vegard.nossum@xxxxxxxxxx> wrote: > > > > > % time make htmldocs > > > ... > > > real 9m0.533s > > > user 15m38.397s > > > sys 1m0.907s > > > > Was this running 'make cleandocs' (or otherwise removing the output > > directory) in between? Sphinx is known to be slower if you already have > > Yes, times were after 'make cleandocs'. > > > an output directory with existing-but-obsolete data, I believe this is > > the case even when switching from one Sphinx version to another. Akira > > also wrote about the 7.x performance: > > > > https://lore.kernel.org/linux-doc/6e4b66fe-dbb3-4149-ac7e-8ae333d6fc9d@xxxxxxxxx/ > > Having looked at the Sphinx code, it doesn't surprise me that > incremental builds can have worse performance. There's probably going > to be some speedups to be found when we go looking for them. Following up on this, Symbol.clear_doc(docname) does a linear walk of symbols which impacts incremental builds. The implementation of clear_doc() looks broken in other ways which I think would further worsen the incremental build performance. Incremental builds also seem to do far more work than I'd expect. A single modified .rst file is quick to build but a handful of modified .rst files seems to trigger a far larger rebuild. That would be worth investigating too. > > > I have an experimental fix that uses a dict for lookups. With the fix, I > > > consistently get times in the sub 5 minute range: > > > > Fantastic! I pushed my performance changes to GitHub if you want to try them out: https://github.com/donaldh/sphinx/tree/c-domain-speedup I noticed that write performance (the second phase of sphinx-build) is quite slow and doesn't really benefit from multi processing with -j nn. It turns out that the bulk of the write work is done in the main process and only the eventual writing is farmed out to forked processes. I experimented with pushing more work out to the forked processes (diff below) and it gives a significant speedup at the cost of breaking index generation. It might be a viable enhancement if indexing can be fixed thru persisting the indices from the sub-processes and merging them in the main process. With the below patch, this is the build time I get: % time make htmldocs SPHINXOPTS=-j12 ... real 1m58.988s user 9m57.817s sys 0m49.411s Note that I get better performance with -j12 than -jauto which auto detects 24 cores. diff --git a/sphinx/builders/__init__.py b/sphinx/builders/__init__.py index 6afb5d4cc44d..6b203799390e 100644 --- a/sphinx/builders/__init__.py +++ b/sphinx/builders/__init__.py @@ -581,9 +581,11 @@ class Builder: self.write_doc(docname, doctree) def _write_parallel(self, docnames: Sequence[str], nproc: int) -> None: - def write_process(docs: list[tuple[str, nodes.document]]) -> None: + def write_process(docs: list[str]) -> None: self.app.phase = BuildPhase.WRITING - for docname, doctree in docs: + for docname in docs: + doctree = self.env.get_and_resolve_doctree(docname, self) + self.write_doc_serialized(docname, doctree) self.write_doc(docname, doctree) # warm up caches/compile templates using the first document @@ -596,6 +598,7 @@ class Builder: tasks = ParallelTasks(nproc) chunks = make_chunks(docnames, nproc) + logger.info(f"_write_parallel: {len(chunks)} chunks") # create a status_iterator to step progressbar after writing a document # (see: ``on_chunk_done()`` function) @@ -607,12 +610,7 @@ class Builder: self.app.phase = BuildPhase.RESOLVING for chunk in chunks: - arg = [] - for docname in chunk: - doctree = self.env.get_and_resolve_doctree(docname, self) - self.write_doc_serialized(docname, doctree) - arg.append((docname, doctree)) - tasks.add_task(write_process, arg, on_chunk_done) + tasks.add_task(write_process, chunk, on_chunk_done) # make sure all threads have finished tasks.join()