This is second review, with the same comments ad the one before, and containing proposal for simpler autogeneration script. On Sat, 24 Oct 2010, Thomas Rast wrote: > Add an autogeneration script that inserts links to other manpages > describing config variables, Actualy it is "Add ... and run it to generate config-vars.txt", or "Create ... and use it to ...", isn't it. > as follows: > > * parse config-vars-src.txt (as it now needs to be renamed) to find > out its current contents > > * parse each manpage source (following includes) for config variable > headers (the blocks are ignored) > > * assemble a new config-vars.txt that completes the original list with > "See linkgit:git-foo[1]" entries for all variables that were not in > it. I find this itemized list slightly hard to read (as you can see by the confusion shown in previous review of this patch). I think it would be better to describe _goal_ of this script first, and only then how it does that. For example: The Documentation/make-config-list.perl script finds all config variables that are defined in individual manpages that are missing from the list of all config variable (in git-config(1) manpage). It then generates stub[1] documentation for those missing config variables at appropriate place in Documentation/config-vars.txt, using the following form: foo.bar:: foo.baz:: See linkgit:git-foo[1]. It does that by parsing config-vars-src.txt (now source for config-vars.txt) getting list of all config variables to find later which ones are missing, parsing each manpage source provided on command line (following includes) for config variable headers to find which config variables are described directly on individual manpages, then it assembles config-vars.txt with missing variables added[2]. NOTE that as a side-effect list of config variables would be sorted in alphabetical order[3] [1] Not a best word, but I couldn't think of a better one. [2] This paragraph could be done as itemized list, as before. [3] In my proposal it is not necessary; note that some of config variables are grouped _by function_ to be close together, and your script would destroy that. > > Signed-off-by: Thomas Rast <trast@xxxxxxxxxxxxxxx> > --- > > Jakub Narebski wrote: > > Could you please resend this patch using rename detection > > (git format-patch -M)? It would make it clear what the difference > > between config-vars and config-vars-src is, if any. > > Right, sorry about that. There wasn't supposed to be any; I'm > resending what I pushed out for everyone's convenience, but the stray > change will be gone in the final version. Stray changes (there are 2 of them). > Documentation/Makefile | 9 ++- > .../{config-vars.txt => config-vars-src.txt} | 2 +- > Documentation/make-config-list.perl | 131 ++++++++++++++++++++ > 3 files changed, 140 insertions(+), 2 deletions(-) > rename Documentation/{config-vars.txt => config-vars-src.txt} (99%) > create mode 100755 Documentation/make-config-list.perl > > diff --git a/Documentation/Makefile b/Documentation/Makefile > index e117bc4..747b849 100644 > --- a/Documentation/Makefile > +++ b/Documentation/Makefile > @@ -125,7 +125,7 @@ endif > > SHELL_PATH ?= $(SHELL) > # Shell quote; > -SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH)) > +SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH)) #' > > # > # Please note that there is a minor bug in asciidoc. Stray change, doesn't belong in this commit (and probably should be skipped altogether). BTW. there was a proposal to do something like that by saving quote in a variable, and using it in shell quoting... > @@ -320,6 +320,13 @@ howto-index.txt: howto-index.sh $(wildcard howto/*.txt) > '$(SHELL_PATH_SQ)' ./howto-index.sh $(wildcard howto/*.txt) >$@+ && \ > mv $@+ $@ > > +config-vars.txt: config-vars-src.txt $(MAN1_TXT) $(MAN5_TXT) $(MAN7_TXT) > + ./make-config-list.perl --mainlist=config-vars-src.txt \ > + --ignore=config-vars.txt \ > + $(MAN1_TXT) $(MAN5_TXT) $(MAN7_TXT) \ > + > config-vars.txt+ && \ Very, very minor nitpick: wouldn't it read better as the following? + ./make-config-list.perl \ + --mainlist=config-vars-src.txt \ + --ignore=config-vars.txt \ + $(MAN1_TXT) $(MAN5_TXT) $(MAN7_TXT) \ + > config-vars.txt+ && \ > + > config-vars.txt+ && \ > + mv config-vars.txt+ config-vars.txt > + > $(patsubst %,%.html,$(ARTICLES)) : %.html : %.txt > $(QUIET_ASCIIDOC)$(ASCIIDOC) $(ASCIIDOC_EXTRA) -b xhtml11 $*.txt > > diff --git a/Documentation/config-vars.txt b/Documentation/config-vars-src.txt > similarity index 99% > rename from Documentation/config-vars.txt > rename to Documentation/config-vars-src.txt > index a8d37a7..949259c 100644 > --- a/Documentation/config-vars.txt > +++ b/Documentation/config-vars-src.txt > @@ -936,7 +936,7 @@ gitcvs.dbname:: > > gitcvs.dbdriver:: > Used Perl DBI driver. You can specify any available driver > - for this here, but it might not work. git-cvsserver is tested > + for this here, but it might not work. git-cvsserver is tested > with 'DBD::SQLite', reported to work with 'DBD::Pg', and > reported *not* to work with 'DBD::mysql'. Experimental feature. > May not contain double colons (`:`). Default: 'SQLite'. Stray change (spaces to tab), which should be done as separate "whitespace cleanup" patch. > diff --git a/Documentation/make-config-list.perl b/Documentation/make-config-list.perl > new file mode 100755 > index 0000000..f086867 > --- /dev/null > +++ b/Documentation/make-config-list.perl > @@ -0,0 +1,131 @@ > +#!/usr/bin/perl > + > +use strict; > +use warnings; > +use Getopt::Long; > + > +my %ignore; > + > +my $rc = GetOptions( > + "mainlist=s" => \my $mainlistfile, > + "ignore=s" => sub { $ignore{$_[1]} = 1 }, > + "no-sort" => \my $no_sort, > + ); Errr, wouldn't it be better to do it this way: +my $sort = 1; +my $rc = GetOptions( + "mainlist=s" => \my $mainlistfile, + "ignore=s" => \my @ignore, + "sort!" => \$sort, +); +my %ignore = map { $_ => 1 } @ignore; Though it is not necessary; I guess it is a matter of taste. BTW. does your sript work correctly with `--no-sort`? > + > +if (!$rc or (!-r $mainlistfile)) { > + print "$0 --mainlist=<mainlist> [--ignore=<ignore>...] <asciidoc_manpage>...\n"; > + exit 1; > +} I'd rather you did a better error handling: +if (!$rc || !defined $mainlistfile) { + print "$0 --mainlist=<mainlist> [--ignore=<ignore>...] <asciidoc_manpage>...\n"; + exit 1; +} and leave handling '!-r $mainlistfile' to read_varlist() subroutine. > + This subroutine needs description, I think; at least oneline comment describing it. > +sub read_varlist { > + my ($file) = @_; > + > + open my $fh, "<", $file or die "cannot open $file: $!"; You are a bit inconsistent here. read_varlist() has open my $fh, "<", $file or die "cannot open $file: $!"; while read_file() has my $fp; open $fp, '<', $name or die "open $name failed: $!"; Note that you can use the same name of filehandle (e.g. $fh), as it is local to the subroutine. > + my (%mainlist, @mainvars); > + > + my ($v, $last_v); > + my $in_block = 0; I would done this differently, and IMHO in a simpler way. Instead of generating %mainvars structure, that holds for each variable the contents of its description (in the form of list of lines), adding missing variables to it, and then (re)generating config-vars.txt from it: --mainlist=config-vars-src.txt | v @mainlist, %mainvars | | <-- %vars <-- <asciidoc_manpage>... | v @all_keys, %out | v > config-vars.txt+ wouldn't it be better to simply store where description of each variable begins in a file (line number, or file position), and then use it to inject generated description for missing variables in correct place? > + while (<$fh>) { > + if (/^(\S+)::/) { > + $v = lc $1; Note: because you are (re)generating config-vars.txt from those data structures this has an unfortunate side-effect of lowercasing all config variable names in final config-bars.txt, e.g. it would replace `core.ignoreCygwinFSTricks` with `core.ignorecygwinfstricks`. > + $in_block = 0; > + push @{$mainlist{$v}}, $_; > + push @mainvars, $v; So in the proposed implementation it would simply be: + $mainlist{lc($v)} = $.; + push @mainvars, $v; I think lc($v) is needed only in %mainlist keys. > + } elsif (/^$/ && !$in_block) { > + if (defined $last_v && !$#{$mainlist{$last_v}}) { > + $mainlist{$last_v} = $mainlist{$v}; > + } > + $last_v = $v; Besides, if I understand it correctly, that any empty line which does not precede config variable definition, and is not inside literal block, is a *bug* in AsciiDoc markup. It should be "^+$" (continuation), not "^$". This branch of conditional would be simply not required in proposed alternate solution. By the way, why you do not follow includes in this file (for example 'include::merge-config.txt[]'? > + } elsif (defined $v) { > + push @{$mainlist{$v}}, $_; > + $in_block = !$in_block if /^--$/; > + } This branch of conditional would be simply not required in proposed alternate solution. Well... assuming that we do not have lines matching /^(\S+)::/ in literal blocks (which we don't, and which you also assume). > + } > + > + close $fh or die "eh? close failed: $!"; You are a bit inconsistent here. read_varlist() has close $fh or die "eh? close failed: $!"; while read_file() has close $fp or die "close $name failed: $!"; > + > + return \%mainlist, \@mainvars; > +} > + > +my %vars; > +my %sections; I think those variables needs better names, for example %vars_manpages (as it is mapping from config variable name found in manpage to list of manpages it was found in) and %manpage_section (as it is mapping from manpage name to man section, for purpose of linking). Though perhaps %vars could be left as is... > + This subroutine needs description, I think; at least oneline comment describing it. And probably better name, like read_manpage(). > +sub read_file { > + my ($name, $main_name) = @_; > + if (!defined $main_name) { > + $main_name = $name; > + } > + my $manpage_name = $main_name; > + $manpage_name =~ s/\.txt//; Wouldn't it be simpler to pass $filename, $manpage as arguments to read_file(), rather than $filename/$name, $main_name? It would read as: +# Parse manpages for config variables and sections, following includes +sub read_manpage { + my ($filename, $manpage) = @_; + if (!defined $manpage) { + $manpage = $filename; + $manpage =~ s/\.txt//; + } And of course s/$name/$filename/ and s/$manpage_name/$manpage/. Well, the change of variable names is not really necessary. > + my $fp; > + open $fp, '<', $name or die "open $name failed: $!"; > + while (<$fp>) { > + if ($. < 5 && /^$manpage_name\((\d+)\)/) { > + $sections{$manpage_name} = $1; > + } > + if (/^([a-z0-9]+\.[a-z<>0-9.]+)::/) { > + push @{$vars{$1}}, $manpage_name; > + } > + if (/^include::\s*(\S+)\s*\[\]/ > + && !exists $ignore{$1}) { > + read_file($1, $main_name); > + } > + } > + close $fp or die "close $name failed: $!"; > +} > + > +foreach my $name (@ARGV) { > + read_file($name); > +} O.K. nicely done. This would be unchanged in proposed alternate solution. > + > +my ($mainlist, $mainvars) = read_varlist($mainlistfile); > + > +my @all_keys = @$mainvars; > +foreach my $k (keys %vars) { > + if (!exists $mainlist->{$k}) { > + push @all_keys, $k; > + } > +} I propose the following: +my @missing_vars = (); +foreach my $k (keys %vars) { + if (!exists $mainlist->{$k}) { + push @missing_vars, $k; + } +} or even +foreach my $k (keys %vars) { + if (exists $mainlist->{$k}) { + delete $vars{$k}; + } +} so that %vars would now contain only missing variables. > + > +@all_keys = sort @all_keys unless $no_sort; So if you pass `--no-sort` to this script, all missing variables would be put at the end, and if you pass it, the order of variables would be changed, isn't it? > + > +my %out; > +foreach my $k (@all_keys) { > + if (exists $mainlist->{$k}) { > + push @{$out{$k}}, @{$mainlist->{$k}}, "\n"; > + } elsif (exists $vars{$k}) { > + push @{$out{$k}}, $k . "::\n"; > + push @{$out{$k}}, "\tSee "; > + my $sep = " "; > + foreach my $p (sort @{$vars{$k}}) { > + next if ($p =~ /$mainlistfile/); > + if (!exists $sections{$p}) { > + warn "section for $p unknown"; > + $sections{$p} = 1; > + } > + push @{$out{$k}}, $sep . "linkgit:" . $p . "[" . $sections{$p} . "]"; > + $sep = ", "; > + } > + push @{$out{$k}}, ".\n\n"; > + } else { > + die "can't happen: $k not in any source"; > + } > +} > + > +for (my $i = 0; $i < $#all_keys; $i++) { > + next if $#{$out{$all_keys[$i]}} != $#{$out{$all_keys[$i+1]}}; > + my $same = 1; > + for (my $j = 1; $j <= $#{$out{$all_keys[$i]}}; $j++) { > + if ($out{$all_keys[$i]}[$j] ne $out{$all_keys[$i+1]}[$j]) { > + $same = 0; > + last; > + } > + } > + if ($same) { > + $out{$all_keys[$i]} = [$out{$all_keys[$i]}[0]]; > + } > +} > + > +foreach my $k (@all_keys) { > + print @{$out{$k}}; > +} Instead of that, I would propose the following UNTESTED code: -- 8< -- %vars = find_insertion_points($mainlist, \%vars); # %vars has now the following format: # 'name' => { # 'lineno' => line of $mainfile where description should be inserted # 'manpages' => [ <manpage_name>... ] # } # group by insertion point my %insertion_point; ## or just %insert foreach my $v (keys %vars) { push @{$insertion_point{$v->{'lineno'}}}, $v; } open my $fh, '<', $mainfile or die "cannot open '$mainfile': $!"; while (my $line = <$fh>) { if (exists $insertion_point{$.}) { print document_vars($insertion_point{$.}, \%vars); print "\n"; } print $line; } # special case: insertion after last line in $mainfile print document_vars($insertion_point{-1}, \%vars) if exists $insertion_point{-1}; close $fh or die "cannot close '$mainfile': $!"; exit 0; ## those subroutines should probably be defined earlier sub find_insertion_points { my ($mainlist, $missing_vars) = @_; my %res; ## not required if we modify %$missing_vars in-place my %all_vars = (%$mainlist, %$missing_vars); my $lineno = -1; # means after last line # reverse order because we want to find a place before which to insert # generated documentation; it is easy to find where description # of variable begins, but in general harder to find where it ends. foreach my $key (reverse sort { lc($a) cmp lc($b) } keys %all_vars) { my $val = $all_vars{$key}; if (ref $val) { # this came from %$missing_vars $res{$v} = { 'lineno' => $lineno, 'manpages' => $val }; ## or $missing_vars->{$v} instead of $res{$v} if in-place } else { # this came from %$mainlist $lineno = $value; } } return %res; ## not required if in-place } ## this is single place where we need to change code if output changes, ## e.g. to ## ## format.attach linkgit:git-format-patch[1] ## format.cc linkgit:git-format-patch[1] ## format.headers linkgit:git-format-patch[1] ## format.pretty linkgit:gitpretty[7] sub document_vars { my ($keylist, $vars) = @_; # generate output for each key now, to compact output, because it is # easier to compare strings than arrays foreach my $k (@$keylist) { $vars->{$k}{'output'} = "\tSee: ".gen_links($vars->{$k}{'manpages'}).".\n"; } my $output = ''; ## the below could be done in different way, e.g. using new array for ## sorted keys, and using 'my $k = pop @keys' etc. $keylist = [ sort @$keylist ]; for (my $i = 0; $i < @$keylist; $i++) { my $k = $keylist->[$i]; $output .= "$k::\n"; unless ($i <= $#@$keylist && $keylist->[$i]{'output'} eq $keylist->[$i+1]{'output'}) { $output .= $k->{'output'}; } } return $output; } ## we can make it smarter in the future, for example: ## 'linkgit:git-foo[1], linkgit:git-bar[1] and linkgit:git-baz[1] for details'; sub gen_links { my $manpages = shift; return join(", ", map { linkgit($_) } @$manpages); } sub linkgit { my $manpage = shift; if (!exists $manpage_section{$manpage}) { warn "section for $manpage unknown\n"; $manpage_section{$manpage} = 1; } return "linkgit:${manpage}[$manpage_section{$manpage}]"; } __END__ -- >8 -- -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html