On Wed, 25 Jan 2017, Jonathan Corbet <corbet@xxxxxxx> wrote: > On Tue, 24 Jan 2017 20:52:40 +0100 > Markus Heiser <markus.heiser@xxxxxxxxxxx> wrote: > >> This patch is the initial merge of a pure python implementation >> to parse kernel-doc comments and generate reST from. >> >> It consist mainly of to parts, the parser module (kerneldoc.py) and the >> sphinx-doc extension (rstKernelDoc.py). For the command line, there is >> also a 'scripts/kerneldoc' added.:: >> >> scripts/kerneldoc --help >> >> The main two parts are merged 1:1 from >> >> https://github.com/return42/linuxdoc commit 3991d3c >> >> Take this as a starting point, there is a lot of work to do (WIP). >> Since it is merged 1:1, you will also notice it's CodingStyle is (ATM) >> not kernel compliant and it lacks a user doc ('Documentation/doc-guide'). >> >> I will send patches for this when the community agreed about >> functionalities. I guess there are a lot of topics we have to agree >> about. E.g. the py-implementation is more strict the perl one. When you >> build doc with the py-module you will see a lot of additional errors and >> warnings compared to the sloppy perl one. Markus, thanks for your work on this. > Again, quick comments... > > - I would *much* rather evolve our existing Sphinx extension in the > direction we want it to go than to just replace it wholesale. > Replacement is the wrong approach for a few reasons, including the need > to minimize change and preserve credit for Jani's work. Can we work on > that basis, please? I would grossly downplay the role of preserving credit for what I've done, and put much more emphasis on the need to create a patch series that gradually, step by step, evolves the current approach into something better. Excuse me for my bluntness, but I think changing everything in a single commit, or even a few commits, is strictly not acceptable. When I changed *small* things in scripts/kernel-doc, I would make htmldocs before and after the change, and recursively diff the produced output to ensure there were no surprises. We already have enough documentation that a manual eyeballing of the output is simply not sufficient to ensure things don't break. The diff in output between before and after this series? 160k lines of unified diff without context ('diff -u0 -r old new | wc -l'). Many of the changes are improvements on the result, such as using proper <div> tags for function parameter lists etc., but clearly changing the output should be independent of changing the parser, so we have some chance of validating the parser. > Ideally at the time of merging, we would be able to build the docs with > *either* kerneldoc. I'd be fine with switching over in a single commit that doesn't drastically change the output. A drop-in replacement. But that's not the case here. > - I'll have to try it out to see how noisy it is. I'm not opposed to > stricter checks; indeed, they could be a good thing. But we might want > to have an option so we can cut back on the noise by default. The increase in 'make htmldocs' build log was from 1521 to 2791 lines in my tree. Arguably there was useful extra diagnosis, but some of it was the printouts of long lists of definitions that were not found, one per line. So it could be condensed without losing info too. On to performance. With the default build options the new system was noticeably slower than the current one, with a 50% increase on my machine. But what really caught me by surprise was that passing SPHINXOPTS=-j5 to parallelize worked better on the current system, making the new one a whopping 70% slower. Of course, the argument is that the proposed parser does more and is better, but due to the monolithic change it's impossible to pinpoint the culprit or do a proper cost/benefit analysis on this. Again, this calls for a more broken down series of patches to make the changes. Finally, while I'd love to see scripts/kernel-doc go, I do have to ask if changing roughly 3k lines of Perl to roughly 3k lines of Python (*) really makes everything better? They both still parse everything using a large pile of regular expressions and a clunky state machine. When I look at the code, I'm afraid I do not get that liberating feeling of throwing out old junk in favor of something small or elegant or even obviously more maintainable than the old one. The new one offers more features, but repeatedly we face the problem that it's all lumped in together with the parser change. We should be able to look at the parser change and the other improvements separately. That said, perhaps having an elegant parser (perhaps based on a compiler plugin) is incompatible with the idea of making it a bug-for-bug drop-in replacement of the old one, and it's something we need to think about. All in all I think the message should be clear: this needs to be split into small, incremental changes. Just like we do everything in the kernel. BR, Jani. (*) Please do not get hung up on these numbers. The Python version does more in some ways, but adds more deps such as fspath that's not included in the figures, and the Perl version outputs more formats. It's not an apples to apples comparison. Let's just say they are somewhere in the same ballpark. -- Jani Nikula, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html