Re: [RFC PATCH v1 2/6] kernel-doc: replace kernel-doc perl parser with a pure python one (WIP)

Jani Nikula <jani.nikula@xxxxxxxxx> · Wed, 25 Jan 2017 12:24:31 +0200

On Wed, 25 Jan 2017, Jonathan Corbet <corbet@xxxxxxx> wrote:
> On Tue, 24 Jan 2017 20:52:40 +0100
> Markus Heiser <markus.heiser@xxxxxxxxxxx> wrote:
>
>> This patch is the initial merge of a pure python implementation
>> to parse kernel-doc comments and generate reST from.
>> 
>> It consist mainly of to parts, the parser module (kerneldoc.py) and the
>> sphinx-doc extension (rstKernelDoc.py). For the command line, there is
>> also a 'scripts/kerneldoc' added.::
>> 
>>    scripts/kerneldoc --help
>> 
>> The main two parts are merged 1:1 from
>> 
>>   https://github.com/return42/linuxdoc  commit 3991d3c
>> 
>> Take this as a starting point, there is a lot of work to do (WIP).
>> Since it is merged 1:1, you will also notice it's CodingStyle is (ATM)
>> not kernel compliant and it lacks a user doc ('Documentation/doc-guide').
>> 
>> I will send patches for this when the community agreed about
>> functionalities. I guess there are a lot of topics we have to agree
>> about. E.g. the py-implementation is more strict the perl one.  When you
>> build doc with the py-module you will see a lot of additional errors and
>> warnings compared to the sloppy perl one.

Markus, thanks for your work on this.

> Again, quick comments...
>
>  - I would *much* rather evolve our existing Sphinx extension in the
>    direction we want it to go than to just replace it wholesale.
>    Replacement is the wrong approach for a few reasons, including the need
>    to minimize change and preserve credit for Jani's work.  Can we work on
>    that basis, please?

I would grossly downplay the role of preserving credit for what I've
done, and put much more emphasis on the need to create a patch series
that gradually, step by step, evolves the current approach into
something better.

Excuse me for my bluntness, but I think changing everything in a single
commit, or even a few commits, is strictly not acceptable.

When I changed *small* things in scripts/kernel-doc, I would make
htmldocs before and after the change, and recursively diff the produced
output to ensure there were no surprises. We already have enough
documentation that a manual eyeballing of the output is simply not
sufficient to ensure things don't break.

The diff in output between before and after this series? 160k lines of
unified diff without context ('diff -u0 -r old new | wc -l').

Many of the changes are improvements on the result, such as using proper
<div> tags for function parameter lists etc., but clearly changing the
output should be independent of changing the parser, so we have some
chance of validating the parser.

>    Ideally at the time of merging, we would be able to build the docs with
>    *either* kerneldoc.

I'd be fine with switching over in a single commit that doesn't
drastically change the output. A drop-in replacement. But that's not the
case here.

>  - I'll have to try it out to see how noisy it is.  I'm not opposed to
>    stricter checks; indeed, they could be a good thing.  But we might want
>    to have an option so we can cut back on the noise by default.

The increase in 'make htmldocs' build log was from 1521 to 2791 lines in
my tree. Arguably there was useful extra diagnosis, but some of it was
the printouts of long lists of definitions that were not found, one per
line. So it could be condensed without losing info too.

On to performance. With the default build options the new system was
noticeably slower than the current one, with a 50% increase on my
machine. But what really caught me by surprise was that passing
SPHINXOPTS=-j5 to parallelize worked better on the current system,
making the new one a whopping 70% slower. Of course, the argument is
that the proposed parser does more and is better, but due to the
monolithic change it's impossible to pinpoint the culprit or do a proper
cost/benefit analysis on this. Again, this calls for a more broken down
series of patches to make the changes.

Finally, while I'd love to see scripts/kernel-doc go, I do have to ask
if changing roughly 3k lines of Perl to roughly 3k lines of Python (*)
really makes everything better? They both still parse everything using a
large pile of regular expressions and a clunky state machine. When I
look at the code, I'm afraid I do not get that liberating feeling of
throwing out old junk in favor of something small or elegant or even
obviously more maintainable than the old one. The new one offers more
features, but repeatedly we face the problem that it's all lumped in
together with the parser change. We should be able to look at the parser
change and the other improvements separately.

That said, perhaps having an elegant parser (perhaps based on a compiler
plugin) is incompatible with the idea of making it a bug-for-bug drop-in
replacement of the old one, and it's something we need to think about.

All in all I think the message should be clear: this needs to be split
into small, incremental changes. Just like we do everything in the
kernel.

BR,
Jani.

(*) Please do not get hung up on these numbers. The Python version does
    more in some ways, but adds more deps such as fspath that's not
    included in the figures, and the Perl version outputs more
    formats. It's not an apples to apples comparison. Let's just say
    they are somewhere in the same ballpark.

-- 
Jani Nikula, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html