Johannes Schindelin wrote: > Hi, > > I do not want to depend on more than necessary in msysGit, and therefore I > started to write an asciidoc replacement. > > So here it is: a perl script that does a good job on many .txt files in > Documentation/, although for some it deviates from "make man"'s output, > and for others it is outright broken. It is meant to be run in > Documentation/. > > My intention is not to fix the script for all cases, but to make patches > to Documentation/*.txt themselves, so that they are more consistent (and > incidentally nicer to the script). > > Now, I hear you already moan: "But Dscho, you know you suck at Perl!" > > Yeah, I know, but maybe instead of bashing on me (pun intended), you may > want to enlighten me with tips how to make it nicer to read. (Yes, there > are no comments; yes, I will gladly add them where appropriate; yes, html > is just a stub.) > > So here, without further ado, da script: It's pretty good, I certainly wouldn't have trouble reading or maintaining it, but I'll give you suggestions anyway. nice work, replacing a massive XML/XSL/etc stack with a small Perl script ;-) Sam. > > -- snip -- > #!/usr/bin/perl Add -w for warnings, also use strict; > $conv = new man_page(); > $conv->{manual} = 'Git Manual'; > $conv->{git_version} = 'Git ' . `cat ../GIT-VERSION-FILE`; > $conv->{git_version} =~ s/GIT_VERSION = //; > $conv->{git_version} =~ s/-/\\-/; > $conv->{git_version} =~ s/\n//; > $conv->{date} = `date +%m/%d/%Y`; > $conv->{date} =~ s/\n//; > > $par = ''; > handle_file($ARGV[0]); > $conv->finish(); > > sub handle_text { this function acts on globals; make them explicit arguments to the function. > if ($par =~ /^\. /s) { > my @lines = split(/^\. /m, $par); > shift @lines; > $conv->enumeration(\@lines); > } elsif ($par =~ /^\* /s) { uncuddle your elsif's; also consider making this a "tabular ternary" with the actions in separate functions. ie $result = ( $par =~ /^\. /s ? $conv->do_enum($par) : $par =~ /^\[verse\]/ ? $conv->do_verse($par) : ... ) However I have a suspicion that your script is doing line-based parsing instead of recursive descent; I don't know whether that's the right thing for asciidoc. It's actually fairly easy to convert a grammar to code blocks using tricks from MJD's _Higher Order Perl_. Is it necessary for the asciidoc grammar? > my @lines = split(/^\* /m, $par); > shift @lines; > $conv->enumeration(\@lines, 'unnumbered'); > } elsif ($par =~ /^\[verse\]/) { > $par =~ s/\[verse\] *\n?//; > $conv->verse($par); > } elsif ($par =~ /^(\t| +)/s) { > $par =~ s/^$1//mg; > $par =~ s/^\+$//mg; > $conv->indent($par); > } elsif ($par =~ /^([^\n]*)::\n((\t| +).*)$/s) { > my ($first, $rest, $indent) = ($1, $2, $3); > $rest =~ s/^\+$//mg; > while ($rest =~ /^(.*?\n\n)--+\n(.*?\n)--+\n\n(.*)$/s) { > my ($pre, $verb, $post) = ($1, $2, $3); > > $pre =~ s/^(\t|$indent)//mg; > if ($first ne '') { > $conv->begin_item($first, $pre); > $first = ''; > } else { > $conv->normal($pre); > } > > $conv->verbatim($verb); > $rest = $post; > } > $rest =~ s/^(\t|$indent)//mg; > if ($first ne '') { > $conv->begin_item($first, $rest); > } else { > $conv->normal($rest); > } > $conv->end_item(); > } elsif ($par =~ /^-+\n(.*\n)-+\n$/s) { > $conv->verbatim($1); > } else { > $conv->normal($par); > } > $par = ''; > } > > sub handle_file { > my $in; > open($in, '<' . $_[0]); > while (<$in>) { > if (/^=+$/) { > if ($par ne '' && length($_) >= length($par)) { > $conv->header($par); > $par = ''; > next; > } > } elsif (/^-+$/) { > if ($par ne '' && length($_) >= length($par)) { > $conv->section($par); > $par = ''; > next; > } > } elsif (/^~+$/) { > if ($par ne '' && length($_) >= length($par)) { > $conv->subsection($par); > $par = ''; > next; > } > } elsif (/^\[\[(.*)\]\]$/) { > handle_text(); > $conv->anchor($1); > next; > } elsif (/^$/) { > if ($par =~ /^-+\n.*[^-]\n$/s) { > # fallthru; is verbatim, but needs more. > } elsif ($par =~ /::\n$/s) { > # is item, but needs more. > next; > } else { > handle_text(); > next; > } > } elsif (/^include::(.*)\[\]$/) { > handle_text(); > handle_file($1); > next; > } > > # convert "\--" to "--" > s/\\--/--/g; > # convert "\*" to "*" > s/\\\*/*/g; > > # handle gitlink: > s/gitlink:([^\[ ]*)\[(\d+)\]/sprintf "%s", > $conv->get_link($1, $2)/ge; > # handle link: > s/link:([^\[ ]*)\[(.+)\]/sprintf "%s", > $conv->get_link($1, $2, 'external')/ge; These REs suffer from LTS (Leaning Toothpick Syndrome). Consider using s{foo}{bar} and adding the 'x' modifier to space out groups. > > $par .= $_; > } > close($in); > handle_text(); > } > > package man_page; > > sub new { > my ($class) = @_; > my $self = { > sep => '', > links => [], > # generator => 'Home grown git txt2man converter' > generator => 'DocBook XSL Stylesheets v1.71.1 <http://docbook.sf.net/>' > }; > bless $self, $class; > return $self; > } > > sub header { > my ($self, $text) = @_; > $text =~ s/-/\\-/g; > > if ($self->{preamble_shown} == undef) { > $title = $text; > $title =~ s/\(\d+\)$//; > print '.\" Title: ' . $title > . '.\" Author: ' . "\n" > . '.\" Generator: ' . $self->{generator} . "\n" > . '.\" Date: ' . $self->{date} . "\n" > . '.\" Manual: ' . $self->{manual} . "\n" > . '.\" Source: ' . $self->{git_version} . "\n" > . '.\"' . "\n"; > } I'd consider a HERE-doc, or multi-line qq{ } more readable than this. > > $text =~ tr/a-z/A-Z/; > my $suffix = "\"$self->{date}\" \"$self->{git_version}\"" > . " \"$self->{manual}\""; Use qq{} when making strings with lots of embedded double quotes and interpolation. > $text =~ s/^(.*)\((\d+)\)$/.TH "\1" "\2" $suffix/; > print $text; > > if ($self->{preamble_shown} == undef) { > print '.\" disable hyphenation' . "\n" > . '.nh' . "\n" > . '.\" disable justification (adjust text to left' > . ' margin only)' . "\n" > . '.ad l' . "\n"; Using commas rather than "." will safe you a concat when printing to filehandles, but that's a very small nit to pick :) > $self->{preamble_shown} = 1; > } > > $self->{last_op} = 'header'; > } > > sub section { > my ($self, $text) = @_; > > $text =~ tr/a-z/A-Z/; > $text =~ s/^(.*)$/.SH "\1"/; > > print $text; > > $self->{last_op} = 'section'; > } > > sub subsection { > my ($self, $text) = @_; > > $text =~ s/^(.*)$/.SS "\1"/; > > print $text; > > $self->{last_op} = 'subsection'; > } > > sub get_link { > my ($self, $command, $section, $option) = @_; > > if ($option eq 'external') { > my $links = $self->{links}; > push(@$links, $command); > $command =~ s/\.html$//; > $command =~ s/-/ /g; > push(@$links, $command); > return '\fI' . $command . '\fR\&[1]'; > } else { > return '\fB' . $command . '\fR(' . $section . ')'; > } > } > > sub common { > my ($self, $text, $option) = @_; > > # escape backslashes, but not in "\n", "\&" or "\fB" > $text =~ s/\\(?!n|f[A-Z]|&)/\\\\/g; > # escape "-" > $text =~ s/-/\\-/g; > # handle ... > $text =~ s/(\.\.\.)/\\&\1/g; > # remove double space after full stop or comma > $text =~ s/([\.,]) /\1 /g; > > if ($option ne 'no-markup') { > # make 'italic' > $text =~ s/'([^'\n]*)'/\\fI\1\\fR/g; > # ignore ` > $text =~ s/`//g; > # make *bold* > $text =~ s/\*([^\*\n]*)\*/\\fB\1\\fR/g; > # handle <<sections> > $text =~ s/<<([^>]*)>>/the section called \\(lq\1\\(rq/g; Hmm, that regex would not match for <<foo > bar>>, if you care you'd need to write something like <<((?:[^>]+|>[^>])*)>> > } > > return $text; > } > > sub normal { > my ($self, $text) = @_; > > if ($text eq "") { > return; > } > > $text = $self->common($text); > > $text =~ s/ *\n(.)/ \1/g; > > if ($self->{last_op} eq 'normal') { > print "\n"; > } > > print $text; > > $self->{last_op} = 'normal'; > } > > sub verse { > my ($self, $text) = @_; > > $text = $self->common($text); > $text =~ s/^\t/ /mg; > > print ".sp\n.RS 4\n.nf\n" . $text . ".fi\n.RE\n"; > > $self->{last_op} = 'verse'; > } > > sub enumeration { > my ($self, $text, $option) = @_; > > my $counter = 0; > foreach $line (@$text) { > $counter++; > print ".TP 4\n" > . ($option eq 'unnumbered' ? '\(bu' : $counter . '.') > . "\n" > . $self->common($line); > } > > $self->{last_op} = 'enumeration'; > } > > sub begin_item { > my ($self, $item, $text) = @_; > > $item = $self->common($item); > $text = $self->common($text); > > $text =~ s/([^\n]) *\n([^\n])/\1 \2/g; "." is the same as [^\n] (without the 's' modifier). > > print ".PP\n" . $item . "\n.RS 4\n" . $text; > > $self->{last_op} = 'item'; > } > > sub end_item { > my ($self) = @_; > > print ".RE\n"; > > $self->{last_op} = 'end_item'; > } > > sub indent { > my ($self, $text) = @_; > > $text = $self->common($text, 'no-markup'); > $text =~ s/^\t/ /mg; > > if ($self->{last_op} eq 'normal') { > print "\n"; > } > > print ".sp\n.RS 4\n.nf\n" . $text . ".fi\n.RE\n"; > > $self->{last_op} = 'indent'; > } > > sub verbatim { > my ($self, $text) = @_; > > $text = $self->common($text, 'no-markup'); > > # convert tabs to spaces > $text =~ s/^\t/ /mg; > # remove trailing empty lines > $text =~ s/\n\n*$/\n/; > > if ($self->{last_op} eq 'normal') { > print "\n"; > } > > print ".sp\n.RS 4\n.nf\n.ft C\n" . $text . ".ft\n\n.fi\n.RE\n"; > > $self->{last_op} = 'verbatim'; > } > > sub anchor { > my ($self, $text) = @_; > > $self->{last_op} = 'anchor'; > } > > sub finish { > my ($self) = @_; > my $links = $self->{links}; > > if ($#$links >= 0) { > print '.SH "REFERENCES"' . "\n"; > my $i = 1; > while ($#$links >= 0) { just use if (@$links) and while (@$links) > my $ref = shift(@$links); > $ref =~ s/-/\\-/g; > my $label = shift(@$links); > printf (".IP \"% 2d.\" 4\n%s\n.RS 4\n\\%%%s\n.RE\n", > $i++, $label, $ref); > } > } else { > print "\n"; > } > } > > package html_page; > > sub new { > my ($class) = @_; > my $self = {}; > bless $self, $class; > return $self; > } > > -- snap -- > > Ciao, > Dscho > > P.S.: I need to catch some Zs, and do some real work, so do not be > surprised if I do not respond within the next 24 hours. > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html