Re: [PATCH 3/4] scripts/kernel-doc: Track line-number of starting blocks

Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx> · Thu, 02 Jun 2016 10:51:32 +0300

On Thu, 02 Jun 2016, Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> Design is pretty simple: kernel-doc inserts breadcrumbs with line
> numbers, and sphinx picks them up. At first I went with a sphinx
> comment, but inserting those at random places seriously upsets the
> parser, and must be filtered. Hence why this version now uses "#define
> LINEO " since one of these ever escape into output it's pretty clear
> there is a bug.
>
> It seems to work well, and at least the 2-3 errors where sphinx
> complained about something that was not correct in kernel-doc text the
> line numbers matched up perfectly.

Very nice indeed!

> v2: Instead of noodling around in the parser state machine, create
> a ViewList and parse it ourselves. This seems to be the recommended
> way, per Jani's suggestion. Unfortunately it does not fix the WARNING
> gallore this patch somehow produces.

What WARNINGs do you see? I don't think I'm seeing anything new from
this patch.

There are occasional problems with document re-generation when you've
changed either kernel-doc and/or kernel-doc.py. Sphinx doesn't know the
output depends on one or the other. For now I've just run 'make
cleandocs' every time I've changed the scripts or the Makefile;
otherwise the dependencies on source files and everything seems to be
working nicely.

> Cc: Jani Nikula <jani.nikula@xxxxxxxxx>
> Cc: linux-doc@xxxxxxxxxxxxxxx
> Cc: Jonathan Corbet <corbet@xxxxxxx>
> Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx>
> ---
>  Documentation/sphinx/kernel-doc.py | 24 ++++++++++++++++++++++--
>  scripts/kernel-doc                 | 18 ++++++++++++++++++
>  2 files changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/sphinx/kernel-doc.py b/Documentation/sphinx/kernel-doc.py
> index 9fc2c27916a9..1ae4d927fa9d 100644
> --- a/Documentation/sphinx/kernel-doc.py
> +++ b/Documentation/sphinx/kernel-doc.py
> @@ -30,10 +30,13 @@
>  import os
>  import subprocess
>  import sys
> +import re
>  
>  from docutils import nodes, statemachine
> +from docutils.statemachine import ViewList
>  from docutils.parsers.rst import directives
>  from sphinx.util.compat import Directive
> +from sphinx.util.nodes import nested_parse_with_titles

We already import nodes, so I think using nodes.nested_parse_with_titles
is nicer.

>  
>  class KernelDocDirective(Directive):
>      """Extract kernel-doc comments from the specified file"""
> @@ -92,8 +95,25 @@ class KernelDocDirective(Directive):
>                  sys.stderr.write(err)
>  
>              lines = statemachine.string2lines(out, tab_width, convert_whitespace=True)
> -            self.state_machine.insert_input(lines, source)
> -            return []
> +            result = ViewList()
> +
> +            lineoffset = 0;
> +            for line in lines:
> +                match = re.search("#define LINENO ([0-9]*)", line)

I wonder if it would be faster to check for line[0] == '#' or
line.startswith("#define LINENO ") first before going for regexp
matching on every line.

Anyway you should match [0-9]+ to require at least one digit, and
surround the regexp with ^ and $.

> +                if match:
> +                    # sphinx counts lines from 0
> +                    lineoffset = int(match.group(1)) - 1
> +                    # we must eat our comments since the upset the markup
> +                else:
> +                    result.append(line, source, lineoffset)
> +                    lineoffset += 1
> +
> +            node = nodes.section()
> +            node.document = self.state.document
> +            nested_parse_with_titles(self.state, result, node)

I think we might like to use self.state.nested_parse(result, 0, node)
instead of nested_parse_with_titles(self.state, result, node). We should
probably prohibit use of titles in the kernel-doc comments, as those
will mess up the document structure in the document including the
kernel-doc.

> +
> +            return node.children
> +

I'd like the switch from using self.state_machine.insert_input() to
creating ViewList() and parsing that to be a separate prep change. Small
steps so we can bisect easier.

The kernel-doc changes seem sensible, but I'd like to have the line
number printouts be behind an option; it's no longer reStructuredText
with that. We could split this patch to:

1. switch from insert_input() to ViewList and return nodes
2. add lineno output support to kernel-doc, behind an option
3. pass the option to kernel-doc from kernel-doc.py, and start handling
   the line numbers in kernel-doc output

BR,
Jani.

>          except Exception as e:
>              env.app.warn('kernel-doc \'%s\' processing failed with: %s' %
>                           (" ".join(cmd), str(e)))
> diff --git a/scripts/kernel-doc b/scripts/kernel-doc
> index 4da6f952d18b..b29c2a576a8c 100755
> --- a/scripts/kernel-doc
> +++ b/scripts/kernel-doc
> @@ -411,13 +411,16 @@ my $doc_inline_end = '^\s*\*/\s*$';
>  my $export_symbol = '^\s*EXPORT_SYMBOL(_GPL)?\s*\(\s*(\w+)\s*\)\s*;';
>  
>  my %parameterdescs;
> +my %parameterdesc_start_lines;
>  my @parameterlist;
>  my %sections;
>  my @sectionlist;
> +my %section_start_lines;
>  my $sectcheck;
>  my $struct_actual;
>  
>  my $contents = "";
> +my $new_start_line = 0;
>  
>  # the canonical section names. see also $doc_sect above.
>  my $section_default = "Description";	# default section
> @@ -516,11 +519,15 @@ sub dump_section {
>  	$name = $1;
>  	$parameterdescs{$name} = $contents;
>  	$sectcheck = $sectcheck . $name . " ";
> +        $parameterdesc_start_lines{$name} = $new_start_line;
> +        $new_start_line = 0;
>      } elsif ($name eq "@\.\.\.") {
>  #	print STDERR "parameter def '...' = '$contents'\n";
>  	$name = "...";
>  	$parameterdescs{$name} = $contents;
>  	$sectcheck = $sectcheck . $name . " ";
> +        $parameterdesc_start_lines{$name} = $new_start_line;
> +        $new_start_line = 0;
>      } else {
>  #	print STDERR "other section '$name' = '$contents'\n";
>  	if (defined($sections{$name}) && ($sections{$name} ne "")) {
> @@ -530,6 +537,8 @@ sub dump_section {
>  	} else {
>  	    $sections{$name} = $contents;
>  	    push @sectionlist, $name;
> +            $section_start_lines{$name} = $new_start_line;
> +            $new_start_line = 0;
>  	}
>      }
>  }
> @@ -1775,6 +1784,7 @@ sub output_blockhead_rst(%) {
>  	if ($output_selection != OUTPUT_INCLUDE) {
>  	    print "**$section**\n\n";
>  	}
> +        print "#define LINENO " . $section_start_lines{$section} . "\n";
>  	output_highlight_rst($args{'sections'}{$section});
>  	print "\n";
>      }
> @@ -1861,6 +1871,7 @@ sub output_section_rst(%) {
>  
>      foreach $section (@{$args{'sectionlist'}}) {
>  	print "**$section**\n\n";
> +        print "#define LINENO " . $section_start_lines{$section} . "\n";
>  	output_highlight_rst($args{'sections'}{$section});
>  	print "\n";
>      }
> @@ -1958,6 +1969,7 @@ sub output_struct_rst(%) {
>  
>  	($args{'parameterdescs'}{$parameter_name} ne $undescribed) || next;
>  	$type = $args{'parametertypes'}{$parameter};
> +        print "#define LINENO " . ($parameterdesc_start_lines{$parameter_name} - 1). "\n";
>  	print "``$type $parameter``\n";
>  	output_highlight_rst($args{'parameterdescs'}{$parameter_name});
>  	print "\n";
> @@ -2745,11 +2757,14 @@ sub process_file($) {
>  	    if (/$doc_start/o) {
>  		$state = STATE_NAME;	# next line is always the function name
>  		$in_doc_sect = 0;
> +                print "#define LINENO " . $. . "\n";
>  	    }
>  	} elsif ($state == STATE_NAME) {# this line is the function name (always)
>  	    if (/$doc_block/o) {
>  		$state = STATE_DOCBLOCK;
>  		$contents = "";
> +                $new_start_line = $. + 1;
> +
>  		if ( $1 eq "" ) {
>  			$section = $section_intro;
>  		} else {
> @@ -2833,6 +2848,7 @@ sub process_file($) {
>  		$in_doc_sect = 1;
>  		$in_purpose = 0;
>  		$contents = $newcontents;
> +                $new_start_line = $.;
>  		while ((substr($contents, 0, 1) eq " ") ||
>  		       substr($contents, 0, 1) eq "\t") {
>  		    $contents = substr($contents, 1);
> @@ -2866,6 +2882,7 @@ sub process_file($) {
>  			dump_section($file, $section, xml_escape($contents));
>  			$section = $section_default;
>  			$contents = "";
> +                        $new_start_line = $.;
>  		    } else {
>  			$contents .= "\n";
>  		    }
> @@ -2900,6 +2917,7 @@ sub process_file($) {
>  	    if ($inline_doc_state == STATE_INLINE_NAME && /$doc_inline_sect/o) {
>  		$section = $1;
>  		$contents = $2;
> +                $new_start_line = $. + 1;
>  		if ($contents ne "") {
>  		    while ((substr($contents, 0, 1) eq " ") ||
>  		           substr($contents, 0, 1) eq "\t") {

-- 
Jani Nikula, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html