On Wed, 25 Jan 2017, Markus Heiser <markus.heiser@xxxxxxxxxxx> wrote: > Am 25.01.2017 um 11:24 schrieb Jani Nikula <jani.nikula@xxxxxxxxx>: > >> Markus, thanks for your work on this. > > Thanks for your comments! > >> Excuse me for my bluntness, but I think changing everything in a single >> commit, or even a few commits, is strictly not acceptable. > > OK, I understand. > >> When I changed *small* things in scripts/kernel-doc, I would make >> htmldocs before and after the change, and recursively diff the produced >> output to ensure there were no surprises. We already have enough >> documentation that a manual eyeballing of the output is simply not >> sufficient to ensure things don't break. > >> The diff in output between before and after this series? 160k lines of >> unified diff without context ('diff -u0 -r old new | wc -l'). >> >> Many of the changes are improvements on the result, such as using proper >> <div> tags for function parameter lists etc., but clearly changing the >> output should be independent of changing the parser, so we have some >> chance of validating the parser. > > > Hmm ... I try to sort my thoughts on this: > > The both parser are generating reST output. We have tested reST output > so it should be enough to compare the reST from the old one with the > new one ... at least theoretical. > > But the problem I see here is, that the perl script generates a > reST output which I can't use. As an example we can take a look at > the man-page builder I shipped in the series. Sorry, I still don't understand *why* you can't use the same rst. Your explanation seems to relate to man pages, but man pages come *afterwards*, and are a separate improvement. I know you talk about lack of proper structure and all that, but *why* can it strictly not be used, if the *current* rst clearly can be used? BR, Jani. > > https://www.mail-archive.com/linux-doc@xxxxxxxxxxxxxxx/msg09017.html > > In the commit message there is a small example: > > <section docname="basics ...> > <title>get_sd_load_idx</title> > <kernel_doc_man manpage="get_sd_load_idx.9"/> > ... > <desc desctype="function" domain="c"....> > <desc_signature ....> > <desc_type>int</desc_type> > ... > > You see that it has <section> tag with childs <title/>, <kernel_doc_man/> > and so on. This structured markup is used by builders, they navigate through > the structured tree picking up nodes and spit out some man-page html, or > whatever builder it is. > > ATM the perl parser generates a reST output which does not have such > a structured tree, so the builder can't navigate in. > > So, what I mean is, the new parser has to generate a complete different reST > output and thats why we can't compare the perl parser with python one on a reST > basis ... and if reST is different, HTML is different :( > > So we do not have any chance to track regression when switching from > the old to the new parser. > > Thats are my thoughts on this topic, may be you have a solution for this? > >> >>> Ideally at the time of merging, we would be able to build the docs with >>> *either* kerneldoc. >> >> I'd be fine with switching over in a single commit that doesn't >> drastically change the output. > > One solution might be to improve the reST output of the perl script > first, so that it produce something which has a structure and we > all can agree on (short: reST output is the reference, ATM the reference > need some improvements) > > If this is a way we like to go, I can send a patch for the perl script, > so that we can commit one a reST reference. > >> A drop-in replacement. But that's not the >> case here. >> >>> - I'll have to try it out to see how noisy it is. I'm not opposed to >>> stricter checks; indeed, they could be a good thing. But we might want >>> to have an option so we can cut back on the noise by default. >> >> The increase in 'make htmldocs' build log was from 1521 to 2791 lines in >> my tree. Arguably there was useful extra diagnosis, but some of it was >> the printouts of long lists of definitions that were not found, one per >> line. So it could be condensed without losing info too. > > Yes, this was just a 1:1 merge from my POC, there are a lot of things > which could be meld down. ATM, for me it is important to get a feedback > on the functionalities and concepts of kernel-doc apps (RFC). > >> On to performance. With the default build options the new system was >> noticeably slower than the current one, with a 50% increase on my >> machine. But what really caught me by surprise was that passing >> SPHINXOPTS=-j5 to parallelize worked better on the current system, >> making the new one a whopping 70% slower. Of course, the argument is >> that the proposed parser does more and is better, but due to the >> monolithic change it's impossible to pinpoint the culprit or do a proper >> cost/benefit analysis on this. Again, this calls for a more broken down >> series of patches to make the changes. > > Ups, I have to look closer ... I thought the py-solution is faster > since it does not for processes and does some caching. > >> Finally, while I'd love to see scripts/kernel-doc go, I do have to ask >> if changing roughly 3k lines of Perl to roughly 3k lines of Python (*) >> really makes everything better? They both still parse everything using a >> large pile of regular expressions and a clunky state machine. When I >> look at the code, I'm afraid I do not get that liberating feeling of >> throwing out old junk in favor of something small or elegant or even >> obviously more maintainable than the old one. The new one offers more >> features, but repeatedly we face the problem that it's all lumped in >> together with the parser change. We should be able to look at the parser >> change and the other improvements separately. >> >> That said, perhaps having an elegant parser (perhaps based on a compiler >> plugin) is incompatible with the idea of making it a bug-for-bug drop-in >> replacement of the old one, and it's something we need to think about. > > Before I started implementing the parser I thought about separating > parsing from generating reST. I played a bit with pycparser > > https://github.com/eliben/pycparser > > but I realized that the coverage of those parser might be not > enough for the kernel sources. At this time you mentioned sparse. > I haven't had time to at sparse but I guess that this is the > tool. > > -- Markus -- > > >> All in all I think the message should be clear: this needs to be split >> into small, incremental changes. Just like we do everything in the >> kernel. >> >> >> BR, >> Jani. >> >> >> (*) Please do not get hung up on these numbers. The Python version does >> more in some ways, but adds more deps such as fspath that's not >> included in the figures, and the Perl version outputs more >> formats. It's not an apples to apples comparison. Let's just say >> they are somewhere in the same ballpark. >> >> -- >> Jani Nikula, Intel Open Source Technology Center > -- Jani Nikula, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html