Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> writes: > Maintaining kernel-doc has been a challenge, as there aren't many > perl developers among maintainers. Also, the logic there is too > complex. Having lots of global variables and using pure functions > doesn't help. > > Rewrite the script in Python, placing most global variables > inside classes. This should help maintaining the script in long > term. [...] > diff --git a/scripts/kernel-doc.py b/scripts/kernel-doc.py > new file mode 100755 > index 000000000000..5cf5ed63f215 > --- /dev/null > +++ b/scripts/kernel-doc.py > @@ -0,0 +1,2757 @@ > +#!/usr/bin/env python3 > +# pylint: disable=R0902,R0903,R0904,R0911,R0912,R0913,R0914,R0915,R0917,R1702 > +# pylint: disable=C0302,C0103,C0301 > +# pylint: disable=C0116,C0115,W0511,W0613 > +# Copyright(c) 2025: Mauro Carvalho Chehab <mchehab@xxxxxxxxxx>. > +# SPDX-License-Identifier: GPL-2.0 The SPDX tag is supposed to be up top, right under the shebang I also think you should give consideration to preserving the other copyright notices in the Perl version. A language translation doesn't remove existing copyrights...who knows how much creativity went into some of those regexes? > +# TODO: implement warning filtering > + > +""" > +kernel_doc > +========== > + > +Print formatted kernel documentation to stdout > + > +Read C language source or header FILEs, extract embedded > +documentation comments, and print formatted documentation > +to standard output. > + > +The documentation comments are identified by the "/**" > +opening comment mark. > + > +See Documentation/doc-guide/kernel-doc.rst for the > +documentation comment syntax. > +""" > + > +import argparse > +import logging > +import os > +import re > +import sys > + > +from datetime import datetime > +from pprint import pformat > + > +from dateutil import tz > + > +# Local cache for regular expressions > +re_cache = {} > + > + > +class Re: So I have to say this bugs me a bit ... the class is fine, but the one-letter case-only difference from the standard "re" class is just going to make the code harder for others to approach. "kern_re" or something like that? Or even "kre" if you really want it to be as short as possible. > + """ > + Helper class to simplify regex declaration and usage, > + > + It calls re.compile for a given pattern. It also allows adding > + regular expressions and define sub at class init time. > + > + Regular expressions can be cached via an argument, helping to speedup > + searches. > + """ [...] > + > +class KernelDoc: > + # Parser states > + STATE_NORMAL = 0 # normal code > + STATE_NAME = 1 # looking for function name > + STATE_BODY_MAYBE = 2 # body - or maybe more description > + STATE_BODY = 3 # the body of the comment > + STATE_BODY_WITH_BLANK_LINE = 4 # the body which has a blank line > + STATE_PROTO = 5 # scanning prototype > + STATE_DOCBLOCK = 6 # documentation block > + STATE_INLINE = 7 # gathering doc outside main block > + > + st_name = [ > + "NORMAL", > + "NAME", > + "BODY_MAYBE", > + "BODY", > + "BODY_WITH_BLANK_LINE", > + "PROTO", > + "DOCBLOCK", > + "INLINE", > + ] So these ... kind of look like enums? That's kind of it for nits ... I do have one wish that will kind of hard to grant overall ... for the long-term maintenance of this code, it would be really nice if every non-trivial regex were described by a comment explaining what it is trying to do. It's not reasonable to expect that as a condition for accepting this rewrite, but it sure would be a nice goal to be working toward. Thanks, jon