People tend to like the idea of PL/R, because R draws some awfully nice (and frequently useful) eye candy, and it sounds cool to be able to make such things from inside PostgreSQL. In my (probably very limted) experience those people often find that it's harder than it looks to make these plots work, and if Google and my ability to come up with applicable search terms are any indication, instructions are not readily available. Hence this HOWTO. I hope I'll be forgiven for the cross post. This HOWTO covers two topics: "How to get PL/R to plot something other than PDFs", and "How to make a function return the image data". HOW TO GET PL/R TO PLOT SOMETHING OTHER THAN PDFS The PL/R initiate will often try to create a plot as follows: CREATE FUNCTION plr_plot_simple() RETURNS VOID AS $$ png('test.png') plot(c(1:10)) dev.off() $$ LANGUAGE plr; SImple enough, but when called, this happens: eggyknap=# select plr_plot_simple(); ERROR: R interpreter expression evaluation error DETAIL: Error in X11(paste("png::", filename, sep = ""), width, height, pointsize, : unable to start device PNG CONTEXT: In PL/R function plr_plot_simple This means R is trying to connect to an X server, and can't. R requires an X server to handle most plot devices, and PostgreSQL typically doesn't have a DISPLAY environment variable set to tell it where to find such a server. R's built-in pdf() device is a notable exception, but the PDFs R will create when asked to plot anything particularly complex are very large, and slow to render, making them an unattractive option. The answer is to tell PostgreSQL where to find an X server, and here there are two alternatives: connect to an exist X server, or make a new one. The casual desktop experimenter will probably have an X server running on the same machine as PostgreSQL, and can simply set the DISPLAY variable PostgreSQL uses to point to this existing server. Production PostgreSQL servers run X servers less commonly, so an alternative is to use Xvfb, a "virtual" X server designed for just such an occasion. This is simple enough: $ Xvfb :5 -screen 0 1024x768x24 This will start Xvfb as server 5, defining screen 0 on that server as having a resolution of 1024x768 with 24-bit color depth. Users may also need to configure Xvfb's authorization properly, using the -ac option (allow all hosts to connect to this server), or the -auth <file> option, where <file> is a file that contains a list of hosts allowed to connect to this Xvfb server. If this is running on the PostgreSQL server itself, most likely this file just contains "localhost" on one line. The PostgreSQL server will have to know where to connect to this server; for the Xvfb command line given above and assuming Xvfb runs on the same server as PostgreSQL, the DISPLAY variable is ":5.0". Most of the time, a script in /etc/init.d calls pg_ctl to start the database when the system boots, and the DISPLAY variable can be set there. Users of Debian packages for PostgreSQL may find that starting the package abstracts pg_ctl far enough away that it's difficult to tell in what script the variable should be set; as it turns out the packagers have created a new file for the sole purpose of managing environment variables. In my box it's called /etc/postgresql/8.2/main/environment, and can just contain the following line: DISPLAY = ":5.0" Note that since PL/R requires a properly set R_HOME environment variable, PL/R users may already have had to deal with setting environment variables inside of PostgreSQL. In fact, R provides a nice function to examine the server's environment variables, called plr_environ(). eggyknap=# select * from plr_environ() where name = 'DISPLAY'; name | value ---------+------- DISPLAY | :5.0 (1 row) With that environment variable set, the original function works: eggyknap=# select plr_plot_simple(); plr_plot_simple ----------------- (1 row) But where's the image? The function put it in a file called test.png, and it turns out that file is in the server's data directory by default. If we want it somewhere else, we need to code the png() call accordingly. Note that the PL/R process runs as the same user as the PostgreSQL server; that user must have proper permissions to create the specified file in the specified location, or it won't work. HOW TO MAKE A FUNCTION RETURN THE IMAGE DATA Telling PostgreSQL to make exciting graphs is all well and good, but if my end goal is to have some client somewhere (for instance, a web application) display the graphs. It's possible to define a function like plr_plot_simple() above, and use pg_read_file() to read the image from the filesystem, but only superusers can call pg_read_file(), so the client would have to connect as a superuser, which is a deal breaker for those web applications interested in staying decently secured. Another option is to make the PL/R function return the binary data. WARNING: The method shown here is fairly convoluted; I'm unconvinced that there exists no simpler/faster/better way of doing it, but this way does work. Although we could probably make PL/R read in the image file as a binary object of some sort and return it, that still creates an image file on the server somewhere. In order to avoid the filesystem entirely, PL/R lets us use the cairoDevice and RGtk2 packages in strange ways, as follows: library(cairoDevice) library(RGtk2) pixmap <- gdkPixmapNew(w=500, h=500, depth=24) asCairoDevice(pixmap) plot(1:10) plot_pixbuf <- gdkPixbufGetFromDrawable(NULL, pixmap, pixmap$getColormap(), 0, 0, 0, 0, 500, 500) buffer <- gdkPixbufSaveToBufferv(plot_pixbuf, "png", character(0), character(0))$buffer I freely admit I don't know GTK well enough to translate this function. Thanks to Michael Lawrence on the R-help mailing list for sample code. I *do* know that if using Xvfb, it is important that the depth parameter in the gdkPixmapNew() function matches the color depth of the Xvfb server, or, perhaps, one of the options specified with Xvfb's -pixmaps argument. At the end of this function, the buffer variable is a raw vector containing PNG image data. I've tried using the rawToChar() function in R to convert that vector to a string and return it to PostgreSQL as a TEXT type, but haven't been able to make it work yet. An alternative is to return the buffer vector directly as an array, as follows: CREATE OR REPLACE FUNCTION plr_plot_return_data() RETURNS CHARACTER[] AS $$ library(cairoDevice) library(RGtk2) pixmap <- gdkPixmapNew(w=500, h=500, depth=24) gdkDrawableSetColormap(pixmap, gdkColormapGetSystem()) asCairoDevice(pixmap) plot(c(1:10)) plot_pixbuf <- gdkPixbufGetFromDrawable(NULL, pixmap, pixmap$getColormap(), 0, 0, 0, 0, 500, 500) buffer <- gdkPixbufSaveToBufferv(plot_pixbuf, "png", character(0), character(0))$buffer return(buffer) $$ LANGUAGE plr; This returns a character array containing the PNG data of this plot, hex encoded. There are several ways to use these data; one example is the following mod_perl script: use DBI; use strict; my $r = shift; $r->send_http_header("image/png"); my $dbh = DBI->connect("DBI:Pg:dbname=eggyknap;host=localhost", 'eggyknap', ''); die "Failed to connect: $!" unless ($dbh); my $sth = $dbh->prepare("SELECT ARRAY_TO_STRING(plr_plot_return_data(), '')"); my @array = $sth->fetchrow_array; $r->print(pack "H*", $array[0]); Lines one and two load Perl's database driver and tell Perl to warn me if I start doing dangerous things with variables, respectively. Line three retrieves an object representing the HTTP response my script should generate, and line four adds a content type header to that response to say that it's PNG data I'm returning. In the next four lines, I connect to the database and call the PL/R function defined above, using the ARRAY_TO_STRING function to tell PostgreSQL to convert the return value to one big string before sending it back to the Perl script. The last line takes the data, which are still hex-encoded, uses the pack function to decode them, and prints the result into the HTTP response object. And on my box, anyway, it works, though I wouldn't describe it as particularly fast. I'm guessing that returning the data from the PL/R function as a string instead of an array might speed things up, as well as avoiding having to pack the data within the Perl script, but I've been unable to make those things work. In any case, it's certainly easier than finding something other than R if the plot is complex enough. - Josh / eggyknap ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match