Plotting with PL/R on *NIX - A HOWTO

"Josh Tolley" <eggyknap@xxxxxxxxx> · Fri, 28 Dec 2007 05:36:59 -0700

People tend to like the idea of PL/R, because R draws some awfully
nice (and frequently useful) eye candy, and it sounds cool to be able
to make such things from inside PostgreSQL. In my (probably very
limted) experience those people often find that it's harder than it
looks to make these plots work, and if Google and my ability to come
up with applicable search terms are any indication, instructions are
not readily available. Hence this HOWTO. I hope I'll be forgiven for
the cross post.

This HOWTO covers two topics: "How to get PL/R to plot something other
than PDFs", and "How to make a function return the image data".

HOW TO GET PL/R TO PLOT SOMETHING OTHER THAN PDFS

The PL/R initiate will often try to create a plot as follows:
CREATE FUNCTION plr_plot_simple() RETURNS VOID AS $$
png('test.png')
plot(c(1:10))
dev.off()
$$ LANGUAGE plr;

SImple enough, but when called, this happens:

eggyknap=# select plr_plot_simple();
ERROR:  R interpreter expression evaluation error
DETAIL:  Error in X11(paste("png::", filename, sep = ""), width,
height, pointsize,  :
  unable to start device PNG
CONTEXT:  In PL/R function plr_plot_simple

This means R is trying to connect to an X server, and can't. R
requires an X server to handle most plot devices, and PostgreSQL
typically doesn't have a DISPLAY environment variable set to tell it
where to find such a server. R's built-in pdf() device is a notable
exception, but the PDFs R will create when asked to plot anything
particularly complex are very large, and slow to render, making them
an unattractive option.

The answer is to tell PostgreSQL where to find an X server, and here
there are two alternatives: connect to an exist X server, or make a
new one. The casual desktop experimenter will probably have an X
server running on the same machine as PostgreSQL, and can simply set
the DISPLAY variable PostgreSQL uses to point to this existing server.
Production PostgreSQL servers run X servers less commonly, so an
alternative is to use Xvfb, a "virtual" X server designed for just
such an occasion. This is simple enough:

$ Xvfb :5 -screen 0 1024x768x24

This will start Xvfb as server 5, defining screen 0 on that server as
having a resolution of 1024x768 with 24-bit color depth. Users may
also need to configure Xvfb's authorization properly, using the -ac
option (allow all hosts to connect to this server), or the -auth
<file> option, where <file> is a file that contains a list of hosts
allowed to connect to this Xvfb server. If this is running on the
PostgreSQL server itself, most likely this file just contains
"localhost" on one line.

The PostgreSQL server will have to know where to connect to this
server; for the Xvfb command line given above and assuming Xvfb runs
on the same server as PostgreSQL, the DISPLAY variable is ":5.0". Most
of the time, a script in /etc/init.d calls pg_ctl to start the
database when the system boots, and the DISPLAY variable can be set
there. Users of Debian packages for PostgreSQL may find that starting
the package abstracts pg_ctl far enough away that it's difficult to
tell in what script the variable should be set; as it turns out the
packagers have created a new file for the sole purpose of managing
environment variables. In my box it's called
/etc/postgresql/8.2/main/environment, and can just contain the
following line:

DISPLAY = ":5.0"

Note that since PL/R requires a properly set R_HOME environment
variable, PL/R users may already have had to deal with setting
environment variables inside of PostgreSQL. In fact, R provides a nice
function to examine the server's environment variables, called
plr_environ().

eggyknap=# select * from plr_environ() where name = 'DISPLAY';
  name   | value
---------+-------
 DISPLAY | :5.0
(1 row)

With that environment variable set, the original function works:

eggyknap=# select plr_plot_simple();
 plr_plot_simple
-----------------

(1 row)

But where's the image? The function put it in a file called test.png,
and it turns out that file is in the server's data directory by
default. If we want it somewhere else, we need to code the png() call
accordingly. Note that the PL/R process runs as the same user as the
PostgreSQL server; that user must have proper permissions to create
the specified file in the specified location, or it won't work.

HOW TO MAKE A FUNCTION RETURN THE IMAGE DATA

Telling PostgreSQL to make exciting graphs is all well and good, but
if my end goal is to have some client somewhere (for instance, a web
application) display the graphs. It's possible to define a function
like plr_plot_simple() above, and use pg_read_file() to read the image
from the filesystem, but only superusers can call pg_read_file(), so
the client would have to connect as a superuser, which is a deal
breaker for those web applications interested in staying decently
secured. Another option is to make the PL/R function return the binary
data. WARNING: The method shown here is fairly convoluted; I'm
unconvinced that there exists no simpler/faster/better way of doing
it, but this way does work.

Although we could probably make PL/R read in the image file as a
binary object of some sort and return it, that still creates an image
file on the server somewhere. In order to avoid the filesystem
entirely, PL/R lets us use the cairoDevice and RGtk2 packages in
strange ways, as follows:

library(cairoDevice)
library(RGtk2)
pixmap <- gdkPixmapNew(w=500, h=500, depth=24)
asCairoDevice(pixmap)
plot(1:10)
plot_pixbuf <- gdkPixbufGetFromDrawable(NULL, pixmap,
pixmap$getColormap(), 0, 0, 0, 0, 500, 500)
buffer <- gdkPixbufSaveToBufferv(plot_pixbuf, "png", character(0),
character(0))$buffer

I freely admit I don't know GTK well enough to translate this
function. Thanks to Michael Lawrence on the R-help mailing list for
sample code. I *do* know that if using Xvfb, it is important that the
depth parameter in the gdkPixmapNew() function matches the color depth
of the Xvfb server, or, perhaps, one of the options specified with
Xvfb's -pixmaps argument.

At the end of this function, the buffer variable is a raw vector
containing PNG image data. I've tried using the rawToChar() function
in R to convert that vector to a string and return it to PostgreSQL as
a TEXT type, but haven't been able to make it work yet. An alternative
is to return the buffer vector directly as an array, as follows:

CREATE OR REPLACE FUNCTION plr_plot_return_data() RETURNS CHARACTER[] AS $$
library(cairoDevice)
library(RGtk2)
pixmap <- gdkPixmapNew(w=500, h=500, depth=24)
gdkDrawableSetColormap(pixmap, gdkColormapGetSystem())
asCairoDevice(pixmap)
plot(c(1:10))
plot_pixbuf <- gdkPixbufGetFromDrawable(NULL, pixmap,
pixmap$getColormap(), 0, 0, 0, 0, 500, 500)
buffer <- gdkPixbufSaveToBufferv(plot_pixbuf, "png", character(0),
character(0))$buffer
return(buffer)
$$ LANGUAGE plr;

This returns a character array containing the PNG data of this plot,
hex encoded. There are several ways to use these data; one example is
the following mod_perl script:

use DBI;
use strict;
my $r = shift;
$r->send_http_header("image/png");
my $dbh = DBI->connect("DBI:Pg:dbname=eggyknap;host=localhost", 'eggyknap', '');
die "Failed to connect: $!" unless ($dbh);
my $sth = $dbh->prepare("SELECT ARRAY_TO_STRING(plr_plot_return_data(), '')");
my @array = $sth->fetchrow_array;
$r->print(pack "H*", $array[0]);

Lines one and two load Perl's database driver and tell Perl to warn me
if I start doing dangerous things with variables, respectively. Line
three retrieves an object representing the HTTP response my script
should generate, and line four adds a content type header to that
response to say that it's PNG data I'm returning. In the next four
lines, I connect to the database and call the PL/R function defined
above, using the ARRAY_TO_STRING function to tell PostgreSQL to
convert the return value to one big string before sending it back to
the Perl script. The last line takes the data, which are still
hex-encoded, uses the pack function to decode them, and prints the
result into the HTTP response object. And on my box, anyway, it works,
though I wouldn't describe it as particularly fast. I'm guessing that
returning the data from the PL/R function as a string instead of an
array might speed things up, as well as avoiding having to pack the
data within the Perl script, but I've been unable to make those things
work. In any case, it's certainly easier than finding something other
than R if the plot is complex enough.

- Josh / eggyknap

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match