Re: [RFC PATCH v2] docs-rst: automatically convert Graphviz and SVG images

Mauro Carvalho Chehab <mchehab@xxxxxxxxxxxxxxxx> · Mon, 28 Nov 2016 07:44:46 -0200

Em Mon, 28 Nov 2016 09:22:31 +0100
Markus Heiser <markus.heiser@xxxxxxxxxxx> escreveu:

> Hi Jon,
> 
> you forgot to reply ML (and other CC) ... so I will do it here.
> 
> Am 27.11.2016 um 18:54 schrieb Jonathan Corbet <corbet@xxxxxxx>:
> 
> > On Sun, 27 Nov 2016 16:25:43 +0100
> > Markus Heiser <markus.heiser@xxxxxxxxxxx> wrote:
> >   
> >> Replacement for the sphinx ``figure`` and ``images`` directive.
> >> 
> >> A image (or figure) directive which make use of the *glob* notation::
> >> 
> >>  .. figure::  hello.*
> >> 
> >> will be converted automatically with the tool chains listed below.:
> >> 
> >> * DOT to SVG: ``DOT(1)`` (http://www.graphviz.org) If graphviz is not
> >>  available, the DOT language is inserted as literal-include.
> >> 
> >> * SVG to PDF:
> >> 
> >>  * CairoSVG (http://cairosvg.org) if installed or alternatively
> >>  * ``convert(1)``: ImageMagick (https://www.imagemagick.org)
> >> 
> >> Signed-off-by: Markus Heiser <markus.heiser@xxxxxxxxxxx>  
> > 
> > So I think this is headed in the right direction.  I have a few thoughts,
> > though...
> > 
> > - Should we really replace the existing sphinx directives, or make a new
> >  one?  I'd be inclined toward the latter just to minimize the potential
> >  for confusion.  
> 
> Since this is for "figure" AND "image" (inline) I thought it is
> better to replace. But I'am open to change this .. concrete ideas?

I agree. IMHO, the best is to keep the existing tags, as it makes
easy for people that read the ReST documentation. Of couse, in
this case, it should keep supporting all options that exist
for figure and image tags.

IMHO, it could also be interesting to have a way to do inline
includes for graphviz (just like the existing graphviz extension),
although I don't have any use case.

> > - We don't want to do the conversion unconditionally, right?  SVG works
> >  just fine for HTML output, we just need to do it for latex, as I
> >  understand it.  
> 
> Yes, you are right. But Sphinx (docutils) is split into a reader and
> a writer part. For more details see "Build Phases" in the Sphinx
> extension tutorial:
> 
>   http://www.sphinx-doc.org/en/1.4.9/extdev/tutorial.html#build-phases
> 
> This patch applies to the reading-phase, where it is not clear if you
> build HTML or LaTeX output. Think about the environment with the parsed
> doctrees, which is build up in this phase and (re-)used for any output.

Isn't there any variable visible by a reader extension or by conf.py
telling the type of output or getting the command line used when
sphinx-build is called?

IMHO, if we have a way to know that, instead of calling dot to first
generate SVG and then calling convert/cairoSVG to convert it to PDF,
it could, instead, use dot directly to convert to PDF.

> When it comes to phase 4 (Writing) the "writer" (latex2e, html, etc) 
> decide which formats of images will be the best, for this the writer
> looks at the reST-source folder which formats available.
> 
> OK, by this explanation, you see, we have to convert in the writer
> phase, but this will bring some other drawbacks. E.g. we have 
> to touch all writers (html, latex2e etc.), this means we have to write
> our own builders for for each output format. If we wan't to be academic
> this might be the best solution, but IMO maintaining own builder
> is not what we want, so I choose the more *pragmatic* solution,
> creating all potential formats in he reading phase  ...  any ideas ?
> 
> 
> > - I don't get the ".*" syntax.  Why not just list the source file by its
> >  actual name, then do the magic to use the converted version (if needed)
> >  behind the scenes?  
> 
> My thought was: 
> 
>   with ".*" the author marks this image for 'auto-conversion'.

IMHO, it would make more sense to use the actual name here,
although I'm ok with either way.

> 
> Sphinx itself does not convert any images, Sphinx's concept
> normally is, that the author commits all image formats to the
> source tree ... but this is not what Linus wants to see in the
> source tree.
> 
> With 'auto-conversion' and preprocessing content in general
> we often have trouble ....
> 
> Sorry, when I go a bit OT, but often I think, that the Linux build
> and the Sphinx (reST) concept are orthogonal, when it comes to file
> locations and this is why we get so often in trouble with 
> 'auto-conversion' and preprocessing content in general ...
> 
> a.) Linux build places all intermediate files under the $PWD folder,
>    which might be O=/tmp/kernel or something else
> 
> b.) Sphinx build expected all files under the 'sourcedir' folder
> 
>  Usage: sphinx-build [options] sourcedir outdir [filenames...]
> 
> And if we look at the ".pyc" discussion it is the same:
> 
> c.) Linux places ".obj" files under the $PWD folder
> 
> d.) python byte-compiles the files *in-place*, which means in 
>    kernel's source, even if O=/tmp/kernel is set or not
> 
> Thats why I voted for "to copy" files in [1], even if I'am
> not happy with such copy solutions ... 
> 
> [1] https://www.mail-archive.com/linux-doc@xxxxxxxxxxxxxxx/msg07655.html

Perhaps it is time to work more closely with Sphinx upstream to allow
it to be more flexible and require less hacks on future versions
to work with the Kernel. It would be great, for example, if we didn't need
to add lots of LaTeX raw code hacks just to make PDF output to work.

> I don't want to rewind to this discussion, I agree with what Jani
> says, I only wan't to remember about the deviation of these two build
> systems.
> 
> 
> >> Documentation/conf.py              |   2 +-
> >> Documentation/doc-guide/hello.dot  |   3 +
> >> Documentation/doc-guide/sphinx.rst |  18 ++++++
> >> Documentation/sphinx/figure.py     | 123 +++++++++++++++++++++++++++++++++++++
> >> 4 files changed, 145 insertions(+), 1 deletion(-)
> >> create mode 100644 Documentation/doc-guide/hello.dot
> >> create mode 100644 Documentation/sphinx/figure.py
> >> 
> >> diff --git a/Documentation/conf.py b/Documentation/conf.py
> >> index 1ac958c..386d792 100644
> >> --- a/Documentation/conf.py
> >> +++ b/Documentation/conf.py
> >> @@ -34,7 +34,7 @@ from load_config import loadConfig
> >> # Add any Sphinx extension module names here, as strings. They can be
> >> # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
> >> # ones.
> >> -extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain']
> >> +extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'figure']
> >> 
> >> # The name of the math extension changed on Sphinx 1.4
> >> if major == 1 and minor > 3:
> >> diff --git a/Documentation/doc-guide/hello.dot b/Documentation/doc-guide/hello.dot
> >> new file mode 100644
> >> index 0000000..504621d
> >> --- /dev/null
> >> +++ b/Documentation/doc-guide/hello.dot
> >> @@ -0,0 +1,3 @@
> >> +graph G {
> >> +      Hello -- World
> >> +}
> >> diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst
> >> index 96fe7ccb..6f8fdd1 100644
> >> --- a/Documentation/doc-guide/sphinx.rst
> >> +++ b/Documentation/doc-guide/sphinx.rst
> >> @@ -217,3 +217,21 @@ Rendered as:
> >>       * .. _`last row`:
> >> 
> >>         - column 3
> >> +
> >> +Figures
> >> +=======
> >> +
> >> +If you want to add an image on either Graphviz or SVG format, you should
> >> +use Sphinx ``figure``, where the name of the image file should end with
> >> +``.*``, like::
> >> +
> >> +  .. figure::  hello.*
> >> +     :alt:    hello world
> >> +
> >> +     DOT's hello world example
> >> +
> >> +
> >> +.. figure::  hello.*
> >> +   :alt:    hello world
> >> +
> >> +   DOT's hello world example
> >> diff --git a/Documentation/sphinx/figure.py b/Documentation/sphinx/figure.py
> >> new file mode 100644
> >> index 0000000..50d63bd
> >> --- /dev/null
> >> +++ b/Documentation/sphinx/figure.py
> >> @@ -0,0 +1,123 @@
> >> +# -*- coding: utf-8; mode: python -*-
> >> +# pylint: disable=W0141,C0113,C0103,C0325
> >> +u"""
> >> +    images
> >> +    ~~~~~~
> >> +
> >> +    Replacement for the sphinx ``figure`` and ``images`` directive.
> >> +
> >> +    :copyright:  Copyright (C) 2016  Markus Heiser
> >> +    :license:    GPL Version 2, June 1991 see Linux/COPYING for details.
> >> +
> >> +    List of customizations:
> >> +
> >> +    * generate PDF from SVG
> >> +
> >> +    * generate SVG / PDF from DOT files, see
> >> +      http://www.graphviz.org/content/dot-language
> >> +
> >> +    A image (or figure) directive which make use of the *glob* notation will be
> >> +    converted automatically with the tool chains listed below.:
> >> +
> >> +    * DOT to SVG: ``DOT(1)`` (http://www.graphviz.org) If graphviz is not
> >> +      available, the DOT language is inserted as literal-include.
> >> +
> >> +    * SVG to PDF:
> >> +
> >> +      * CairoSVG (http://cairosvg.org) if installed or alternatively
> >> +      * ``convert(1)``: ImageMagick (https://www.imagemagick.org)
> >> +"""
> >> +
> >> +import os
> >> +from glob import glob
> >> +import subprocess
> >> +
> >> +from docutils import nodes
> >> +from docutils.parsers.rst import directives
> >> +from docutils.parsers.rst.directives import images
> >> +
> >> +from sphinx.directives import patches
> >> +
> >> +try:
> >> +    import cairosvg
> >> +except ImportError:
> >> +    cairosvg = None
> >> +
> >> +def cmd_exists(cmd):
> >> +    exit_code = subprocess.call(
> >> +        "type " + cmd, shell = True,
> >> +        stdout = subprocess.PIPE, stderr = subprocess.PIPE )
> >> +    return bool(exit_code == 0)  
> > 
> > So I would really like to find a way to get rid of all "shell=True"
> > invocations.  Here you're using a shell builtin, which makes it harder, I
> > guess.  Still, a proper solution is to search through the path for a
> > suitable executable file; there must be a Python library that will do
> > that?  
> 
> Should not be hard ... coming soon.
> 
> >   
> >> +
> >> +convert_exists = cmd_exists('convert')
> >> +dot_exists = cmd_exists('dot')  
> >   
> >> +def setup(app):  # pylint: disable=W0613
> >> +    directives.register_directive('figure', Figure)
> >> +    directives.register_directive('image', Image)
> >> +
> >> +def asterix_conversions(folder, node):  
> > 
> > "Asterix" is an indomitable Gaul; "asterisk" is "*" :)  But again, I'm
> > not convinced we need this step at all.
> >   
> 
> OMG :-)

If we remove support for *, then the patch would need to touch the
files under media, as they currently relies on it.

I would prefer if the extension keeps supporting it, even if we
decide to use the file extension directly at the figure/image
directives.

> >> +    if not node['uri'].endswith('.*'):
> >> +        return node
> >> +    name = folder + os.path.sep + os.path.splitext(node['uri'])[0]
> >> +
> >> +    fnames = glob(name + '.*')
> >> +    if (name + '.dot' in fnames):
> >> +        if not dot_exists:
> >> +            with open(name + '.dot', "r") as dot:
> >> +                data = dot.read()
> >> +                node = nodes.literal_block(data, data)
> >> +        else:
> >> +            dot2svg(name)
> >> +
> >> +    fnames = glob(name + '.*')  
> > 
> > Assuming we keep this, why glob the same pattern twice?  
> 
> The "dot2svg(name)" in the line above might create a SVG-file
> and this should be converted to PDF also ... 

If we find a way to teach it to identify if the build is for LaTeX or
not, then we could also add a dot2pdf() function and avoid the
intermediate step of a SVG output in such case.

> 
> >   
> >> +    if (name + '.svg' in fnames):
> >> +        svg2pdf(name)
> >> +    return node
> >> +
> >> +
> >> +def dot2svg(fname):
> >> +    cmd = "dot -Tsvg %s.dot" % fname
> >> +    with open(fname + '.svg', "w") as svg:
> >> +        exit_code = subprocess.call(
> >> +            cmd, shell = True,
> >> +            stdout = svg,
> >> +            stderr = subprocess.PIPE )
> >> +        svg.flush()
> >> +    return bool(exit_code == 0)  
> > 
> > ...and this is just why I don't want shell=True.  If somebody slips in a
> > bogus figure directive, they just ran an arbitrary command here.  It
> > really shouldn't be necessary, especially if you've already located the
> > dot binary.  
> 
> Will be replaced ...
> 
> >> +
> >> +def svg2pdf(fname):
> >> +
> >> +    if cairosvg:
> >> +        cairosvg.svg2pdf(url = fname + '.svg', write_to = fname + '.pdf')
> >> +        return True
> >> +
> >> +    if convert_exists:
> >> +        cmd = "convert %s.svg %s.pdf" % (fname, fname)
> >> +        exit_code = subprocess.call(
> >> +            cmd, shell = True,
> >> +            stdout = subprocess.PIPE, stderr = subprocess.PIPE )
> >> +        return bool(exit_code == 0)
> >> +
> >> +    return False
> >> +
> >> +class Image(images.Image):
> >> +
> >> +    def run(self):
> >> +        result = images.Image.run(self)
> >> +        if len(result) == 2 or isinstance(result[0], nodes.system_message):
> >> +            return result
> >> +        folder = os.path.dirname(self.state.document.current_source)
> >> +        result[0] = asterix_conversions(folder, result[0])
> >> +        return result
> >> +
> >> +class Figure(patches.Figure):
> >> +
> >> +    def run(self):
> >> +        result = patches.Figure.run(self)
> >> +        if len(result) == 2 or isinstance(result[0], nodes.system_message):
> >> +            return result
> >> +        folder = os.path.dirname(self.state.document.current_source)
> >> +        result[0][0] = asterix_conversions(folder, result[0][0])
> >> +        return result
> >> +  
> > 
> > Thanks for looking at this,  
> 
> 
> Thanks for your thoughts and comments :-)
> 
> 
> --Markus--
> 
> 

Thanks,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html