Re: [PATCH 1/1] kernel-doc: Support arrays of pointers struct fields

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 05 Feb 2024, Jonathan Corbet <corbet@xxxxxxx> wrote:
> Sakari Ailus <sakari.ailus@xxxxxxxxxxxxxxx> writes:
>
>>> Sigh ... seeing more indecipherable regexes added to kernel-doc is like
>>> seeing another load of plastic bags dumped into the ocean...  it doesn't
>>> change the basic situation, but it's still sad.
>>> 
>>> Oh well, applied, thanks.
>>
>> Thanks. I have to say I feel the same...
>>
>> Regexes aren't great for parsing C, that's for sure. :-I But what are the
>> options? Write a proper parser for (a subset of) C?
>
> Every now and then I've pondered on this a bit.  There are parsers out
> there, of course; we could consider using something like tree-sitter.
> There's just two little problems:
>
> - That's a massive dependency to drag into the docs build that seems
>   unlikely to speed things up.
>
> - kernel-doc is really two parsers - one for C code, one for the
>   comment syntax.  Strangely, nobody has written a grammar for this
>   combination.
>
> A suitably motivated developer could probably create a C+kerneldoc
> grammer that would let us make a rock-solid, tree-sitter-based parser
> that would be mostly maintained by somebody else.  But that doesn't get
> us around the "adding a big dependency" problem.

After we'd made kernel-doc the perl script to produce rst, and
kernel-doc the Sphinx extension to consume it, I pondered the same
questions, and wondered what it should all look like if you could just
ignore all the kernel legacy.

I've told the story before, but what I ended up with was:

- Use Python bindings for libclang to parse the source code. Clang is
  obviously a big dependency, but nowadays more people have it already
  installed, and the Python part on top is neglible.

- Don't parse the contents of the comments, at all. Treat it as pure
  rst, and let Sphinx handle it.

That's pretty much how Hawkmoth [1] got started. I never even considered
it for kernel, because it would've been:

> <back to work now...>

Although Mesa now uses it to produce stuff like [2].

A suitably motivated developer could probably get it to work with the
kernel... Nowadays you could use Sphinx mechanisms to extend it to
convert kernel-doc style comments to rst.

There are a number of issues that might make it difficult, though:

- kernel-doc parses extra magic stuff like EXPORT_SYMBOL().

- all the special casing in kernel-doc dump_struct(), like

	$members =~ s/\bSTRUCT_GROUP(\(((?:(?>[^)(]+)|(?1))*)\))[^;]*;/$2/gos;

- it's a compiler, so you'll need to pass suitable compiler options,
  which might be difficult with all the per-directory kbuild magic

- might end up being slow, because it's a compiler (although there's
  some caching to avoid parsing the same file multiple times like
  kernel-doc currently does)

Anyway, I think it would be important to separate the parsing of C and
parsing of comments. It's kind of in the same bag in kernel-doc. But if
you want to cross-check, say, the parameters/members against the
documentation, you'll need the C AST while parsing the comments. And the
preprocessor tricks employed in the kernel are probably going to be a
nightmare.

What I'm saying is, while Hawkmoth is perhaps not the right solution,
using any generic C parser will face some of the same issues regardless.


BR,
Jani.

[1] https://github.com/jnikula/hawkmoth/
[2] https://docs.mesa3d.org/isl/index.html

-- 
Jani Nikula, Intel




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux