Re: [RFC PATCH v2] docs-rst: automatically convert Graphviz and SVG images

Markus Heiser <markus.heiser@xxxxxxxxxxx> · Mon, 28 Nov 2016 09:22:31 +0100

Hi Jon,

you forgot to reply ML (and other CC) ... so I will do it here.

Am 27.11.2016 um 18:54 schrieb Jonathan Corbet <corbet@xxxxxxx>:

> On Sun, 27 Nov 2016 16:25:43 +0100
> Markus Heiser <markus.heiser@xxxxxxxxxxx> wrote:
> 
>> Replacement for the sphinx ``figure`` and ``images`` directive.
>> 
>> A image (or figure) directive which make use of the *glob* notation::
>> 
>>  .. figure::  hello.*
>> 
>> will be converted automatically with the tool chains listed below.:
>> 
>> * DOT to SVG: ``DOT(1)`` (http://www.graphviz.org) If graphviz is not
>>  available, the DOT language is inserted as literal-include.
>> 
>> * SVG to PDF:
>> 
>>  * CairoSVG (http://cairosvg.org) if installed or alternatively
>>  * ``convert(1)``: ImageMagick (https://www.imagemagick.org)
>> 
>> Signed-off-by: Markus Heiser <markus.heiser@xxxxxxxxxxx>
> 
> So I think this is headed in the right direction.  I have a few thoughts,
> though...
> 
> - Should we really replace the existing sphinx directives, or make a new
>  one?  I'd be inclined toward the latter just to minimize the potential
>  for confusion.

Since this is for "figure" AND "image" (inline) I thought it is
better to replace. But I'am open to change this .. concrete ideas?

> - We don't want to do the conversion unconditionally, right?  SVG works
>  just fine for HTML output, we just need to do it for latex, as I
>  understand it.

Yes, you are right. But Sphinx (docutils) is split into a reader and
a writer part. For more details see "Build Phases" in the Sphinx
extension tutorial:

  http://www.sphinx-doc.org/en/1.4.9/extdev/tutorial.html#build-phases

This patch applies to the reading-phase, where it is not clear if you
build HTML or LaTeX output. Think about the environment with the parsed
doctrees, which is build up in this phase and (re-)used for any output.

When it comes to phase 4 (Writing) the "writer" (latex2e, html, etc) 
decide which formats of images will be the best, for this the writer
looks at the reST-source folder which formats available.

OK, by this explanation, you see, we have to convert in the writer
phase, but this will bring some other drawbacks. E.g. we have 
to touch all writers (html, latex2e etc.), this means we have to write
our own builders for for each output format. If we wan't to be academic
this might be the best solution, but IMO maintaining own builder
is not what we want, so I choose the more *pragmatic* solution,
creating all potential formats in he reading phase  ...  any ideas ?

> - I don't get the ".*" syntax.  Why not just list the source file by its
>  actual name, then do the magic to use the converted version (if needed)
>  behind the scenes?

My thought was: 

  with ".*" the author marks this image for 'auto-conversion'.

Sphinx itself does not convert any images, Sphinx's concept
normally is, that the author commits all image formats to the
source tree ... but this is not what Linus wants to see in the
source tree.

With 'auto-conversion' and preprocessing content in general
we often have trouble ....

Sorry, when I go a bit OT, but often I think, that the Linux build
and the Sphinx (reST) concept are orthogonal, when it comes to file
locations and this is why we get so often in trouble with 
'auto-conversion' and preprocessing content in general ...

a.) Linux build places all intermediate files under the $PWD folder,
   which might be O=/tmp/kernel or something else

b.) Sphinx build expected all files under the 'sourcedir' folder

 Usage: sphinx-build [options] sourcedir outdir [filenames...]

And if we look at the ".pyc" discussion it is the same:

c.) Linux places ".obj" files under the $PWD folder

d.) python byte-compiles the files *in-place*, which means in 
   kernel's source, even if O=/tmp/kernel is set or not

Thats why I voted for "to copy" files in [1], even if I'am
not happy with such copy solutions ... 

[1] https://www.mail-archive.com/linux-doc@xxxxxxxxxxxxxxx/msg07655.html

I don't want to rewind to this discussion, I agree with what Jani
says, I only wan't to remember about the deviation of these two build
systems.

>> Documentation/conf.py              |   2 +-
>> Documentation/doc-guide/hello.dot  |   3 +
>> Documentation/doc-guide/sphinx.rst |  18 ++++++
>> Documentation/sphinx/figure.py     | 123 +++++++++++++++++++++++++++++++++++++
>> 4 files changed, 145 insertions(+), 1 deletion(-)
>> create mode 100644 Documentation/doc-guide/hello.dot
>> create mode 100644 Documentation/sphinx/figure.py
>> 
>> diff --git a/Documentation/conf.py b/Documentation/conf.py
>> index 1ac958c..386d792 100644
>> --- a/Documentation/conf.py
>> +++ b/Documentation/conf.py
>> @@ -34,7 +34,7 @@ from load_config import loadConfig
>> # Add any Sphinx extension module names here, as strings. They can be
>> # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
>> # ones.
>> -extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain']
>> +extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'figure']
>> 
>> # The name of the math extension changed on Sphinx 1.4
>> if major == 1 and minor > 3:
>> diff --git a/Documentation/doc-guide/hello.dot b/Documentation/doc-guide/hello.dot
>> new file mode 100644
>> index 0000000..504621d
>> --- /dev/null
>> +++ b/Documentation/doc-guide/hello.dot
>> @@ -0,0 +1,3 @@
>> +graph G {
>> +      Hello -- World
>> +}
>> diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst
>> index 96fe7ccb..6f8fdd1 100644
>> --- a/Documentation/doc-guide/sphinx.rst
>> +++ b/Documentation/doc-guide/sphinx.rst
>> @@ -217,3 +217,21 @@ Rendered as:
>>       * .. _`last row`:
>> 
>>         - column 3
>> +
>> +Figures
>> +=======
>> +
>> +If you want to add an image on either Graphviz or SVG format, you should
>> +use Sphinx ``figure``, where the name of the image file should end with
>> +``.*``, like::
>> +
>> +  .. figure::  hello.*
>> +     :alt:    hello world
>> +
>> +     DOT's hello world example
>> +
>> +
>> +.. figure::  hello.*
>> +   :alt:    hello world
>> +
>> +   DOT's hello world example
>> diff --git a/Documentation/sphinx/figure.py b/Documentation/sphinx/figure.py
>> new file mode 100644
>> index 0000000..50d63bd
>> --- /dev/null
>> +++ b/Documentation/sphinx/figure.py
>> @@ -0,0 +1,123 @@
>> +# -*- coding: utf-8; mode: python -*-
>> +# pylint: disable=W0141,C0113,C0103,C0325
>> +u"""
>> +    images
>> +    ~~~~~~
>> +
>> +    Replacement for the sphinx ``figure`` and ``images`` directive.
>> +
>> +    :copyright:  Copyright (C) 2016  Markus Heiser
>> +    :license:    GPL Version 2, June 1991 see Linux/COPYING for details.
>> +
>> +    List of customizations:
>> +
>> +    * generate PDF from SVG
>> +
>> +    * generate SVG / PDF from DOT files, see
>> +      http://www.graphviz.org/content/dot-language
>> +
>> +    A image (or figure) directive which make use of the *glob* notation will be
>> +    converted automatically with the tool chains listed below.:
>> +
>> +    * DOT to SVG: ``DOT(1)`` (http://www.graphviz.org) If graphviz is not
>> +      available, the DOT language is inserted as literal-include.
>> +
>> +    * SVG to PDF:
>> +
>> +      * CairoSVG (http://cairosvg.org) if installed or alternatively
>> +      * ``convert(1)``: ImageMagick (https://www.imagemagick.org)
>> +"""
>> +
>> +import os
>> +from glob import glob
>> +import subprocess
>> +
>> +from docutils import nodes
>> +from docutils.parsers.rst import directives
>> +from docutils.parsers.rst.directives import images
>> +
>> +from sphinx.directives import patches
>> +
>> +try:
>> +    import cairosvg
>> +except ImportError:
>> +    cairosvg = None
>> +
>> +def cmd_exists(cmd):
>> +    exit_code = subprocess.call(
>> +        "type " + cmd, shell = True,
>> +        stdout = subprocess.PIPE, stderr = subprocess.PIPE )
>> +    return bool(exit_code == 0)
> 
> So I would really like to find a way to get rid of all "shell=True"
> invocations.  Here you're using a shell builtin, which makes it harder, I
> guess.  Still, a proper solution is to search through the path for a
> suitable executable file; there must be a Python library that will do
> that?

Should not be hard ... coming soon.

> 
>> +
>> +convert_exists = cmd_exists('convert')
>> +dot_exists = cmd_exists('dot')
> 
>> +def setup(app):  # pylint: disable=W0613
>> +    directives.register_directive('figure', Figure)
>> +    directives.register_directive('image', Image)
>> +
>> +def asterix_conversions(folder, node):
> 
> "Asterix" is an indomitable Gaul; "asterisk" is "*" :)  But again, I'm
> not convinced we need this step at all.
> 

OMG :-)

>> +    if not node['uri'].endswith('.*'):
>> +        return node
>> +    name = folder + os.path.sep + os.path.splitext(node['uri'])[0]
>> +
>> +    fnames = glob(name + '.*')
>> +    if (name + '.dot' in fnames):
>> +        if not dot_exists:
>> +            with open(name + '.dot', "r") as dot:
>> +                data = dot.read()
>> +                node = nodes.literal_block(data, data)
>> +        else:
>> +            dot2svg(name)
>> +
>> +    fnames = glob(name + '.*')
> 
> Assuming we keep this, why glob the same pattern twice?

The "dot2svg(name)" in the line above might create a SVG-file
and this should be converted to PDF also ... 

> 
>> +    if (name + '.svg' in fnames):
>> +        svg2pdf(name)
>> +    return node
>> +
>> +
>> +def dot2svg(fname):
>> +    cmd = "dot -Tsvg %s.dot" % fname
>> +    with open(fname + '.svg', "w") as svg:
>> +        exit_code = subprocess.call(
>> +            cmd, shell = True,
>> +            stdout = svg,
>> +            stderr = subprocess.PIPE )
>> +        svg.flush()
>> +    return bool(exit_code == 0)
> 
> ...and this is just why I don't want shell=True.  If somebody slips in a
> bogus figure directive, they just ran an arbitrary command here.  It
> really shouldn't be necessary, especially if you've already located the
> dot binary.

Will be replaced ...

>> +
>> +def svg2pdf(fname):
>> +
>> +    if cairosvg:
>> +        cairosvg.svg2pdf(url = fname + '.svg', write_to = fname + '.pdf')
>> +        return True
>> +
>> +    if convert_exists:
>> +        cmd = "convert %s.svg %s.pdf" % (fname, fname)
>> +        exit_code = subprocess.call(
>> +            cmd, shell = True,
>> +            stdout = subprocess.PIPE, stderr = subprocess.PIPE )
>> +        return bool(exit_code == 0)
>> +
>> +    return False
>> +
>> +class Image(images.Image):
>> +
>> +    def run(self):
>> +        result = images.Image.run(self)
>> +        if len(result) == 2 or isinstance(result[0], nodes.system_message):
>> +            return result
>> +        folder = os.path.dirname(self.state.document.current_source)
>> +        result[0] = asterix_conversions(folder, result[0])
>> +        return result
>> +
>> +class Figure(patches.Figure):
>> +
>> +    def run(self):
>> +        result = patches.Figure.run(self)
>> +        if len(result) == 2 or isinstance(result[0], nodes.system_message):
>> +            return result
>> +        folder = os.path.dirname(self.state.document.current_source)
>> +        result[0][0] = asterix_conversions(folder, result[0][0])
>> +        return result
>> +
> 
> Thanks for looking at this,

Thanks for your thoughts and comments :-)

--Markus--

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html