Re: [HACKERS] EXISTS optimization

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxx> · Fri, 23 Mar 2007 21:28:36 -0500

>>> On Fri, Mar 23, 2007 at  6:04 PM, in message
<b6e8f2e80703231604v72b9dc4dr51eebd62274d53ec@xxxxxxxxxxxxxx>, "Peter Kovacs"
<peter.kovacs.1.0rc@xxxxxxxxx> wrote: 
> On 3/23/07, Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx> wrote:
> [...]
>> That's the good news.  The bad news is that I operate under a management 
> portability dictate which doesn't currently allow that syntax, since not all 
> of the products they want to
> 
> It doesn't really touch the substance, but I am curious: are you not
> even allowed to discriminate between products in your code like:
> if db is 'postresql' then
> ...
> else
> ...
> ?
> 
> What would be the rationale for that?

Anybody who's not curious about that should skip the rest of this email.

Management has simply given a mandate that the software be independent of OS and database vendor, and to use Java to help with the OS independence.  I have to admit that I am the architect of the database independence solution that was devised.  (The choice of Java for the OS independence has been very successful.  We have run our bytecode on HP-UX, Windows, Sun Solaris, and various flavors of Linux without having to compile different versions of the bytecode.  Other than when people get careless with case sensitivity on file names or with path separators, it just drops right in and runs.

For the data side, we write all of our queries in ANSI SQL in our own query tool, parse it, and generate Java classes to run it.  The ANSI source is broken down to "lowest common denominator" queries, with all procedural code covered in the Java query classes.  So we have stored procedures which can be called, triggers that fire, etc. in Java, issuing SELECT, INSERT, UPDATE, DELETE statements to the database.  This allows us to funnel all DML through a few "primitive" routines which capture before and after images and save them in our own transaction image tables.  We use this to replicate from our 72 county databases, which are the official court record, to multiple central databases, and a transaction repository, used for auditing case activity and assisting with failure recovery.

The problem with burying 'if db is MaxDB', 'if db is SQLServer', 'if db is PostgreSQL' everywhere is that you have no idea what to do when you then want to drop in some different product.   We have a plugin layer to manage known areas of differences which aren't handled cleanly by JDBC, where the default behavior is ANSI-compliant, and a few dozen to a few hundred  lines need to be written to modify that default support a new database product.  (Of course, each one so far has brought in a few surprises, making the plugin layer just a little bit thicker.)

So, to support some new syntax, we have to update our parser, and have a way to generate code which runs on all the candidate database products, either directly or through a plugin layer.  If any of the products don't support multi-value row value constructors, I have a hard time seeing a good way to cover that with the plugin.  On the subject issue, I'm pretty sure it would actually be less work for me to modify the PostgreSQL optimizer to efficiently handle the syntax we do support than to try to bend row value constructors to a syntax that is supported on other database products.

And, by the way, I did take a shot on getting them to commit to PostgreSQL as the long-term solution, and relax the portability rules.  No sale.  Perhaps when everything is converted to PostgreSQL and working for a while they may reconsider.

-Kevin