Re: Postgres SQL Syntax

"Ted Byers" <r.ted.byers@xxxxxxxxxx> · Fri, 9 Feb 2007 16:31:15 -0500

----- Original Message ----- 
From: "Merlin Moncure" <mmoncure@xxxxxxxxx>

the open standard to convert data from one database to another,
unfortunately, is SQL.  SQL is incomplete, illogical, and obscure, so
here we are.

The same can be said about any programming language, can it not?

Even languages as powerful and expressive as C++ and Perl have elements that 
seem illogical.  That they are incomplete is substantially demonstrated by 
the existence of the boost library, most of which ought to be added to the 
C++ standard library, and CPAN.  But for perl, there is no standard so it is 
hard to say definitively where the language ends and developer's libraries 
begin.  I am sure that all of the programming students I have taught have 
found C++ template metaprogramming obscure and very difficult to fathom, but 
I don't think it is possible at this time to beat Perl, and especially its 
object model, for obscurity.  I will concede, though, that this impression 
may be an artifact of my being used to the object models in C++ and Java, 
and that therefore the perl object model just appears to me to be a bit 
weird.  For the same reason, I find some aspects of SQL difficult to fathom. 
That may be because I am not as familiar with set theory as I am with 
algebra and calculus and analytic geometry, or it may be an artifact of the 
languages with which I am most comfortable.  I don't know if you can do 
statistical analyses, such as nonlinear least squares, time series analysis 
r chi-squared tests within SQL, but when faced with such a problem I tend to 
follow the path of least resistance and export the data from the database 
into my client code and do whatever analysis I need using my C++ or Java 
code.  But it there is a lot of data, I suspect there would be much less 
impact on network traffic, and probably better performance, if the analysis 
could be done in a SQL stored procedure.  One of the more common analyses I 
need to do involves time series analysis, sometimes with some kind of 
smoothing (such as a moving average) applied before the analysis proper.

moving data from mysql to postgresql is easy...its the table schemas
that are tough.  If you have the table schemas done, you can

When I design my databases, I normally assume that I may have to migrate the 
data from one RDBMS profuct to another, for whatever reason.  Therefore, I 
have the ones I am most likely to have to support running here.  I create a 
SQL script to create my databases, aiming to use the highest common factor 
across the RDBMS' SQL, and test it on each to verify that I succeeded in 
finding the higest common factor.  I have, then, a simple perl script to 
execute the script, and it "knows" which tools to use based on a 
configuration file that has the specifics for using MySQL or Postgresql or 
MS SQL Server, and a commandline parameter that specifies which DB to use.

mysqldump --compatible=postgresql | psql

which should work for 90% of tables, because mysql supports only a
subset of the types postgresql supports.

the schemas are a bit trickier...you have to do them by hand or use a
conversion tool.  one such tool is DTS.  if you look around you might
find something else though.

There's many more than one way to skin a cat.  My inclination is to take a 
brute force, albeit brain-dead, approach.  One of my irritants is that none 
of the RDBMS products appear to support the same ways of importing data, 
e.g. from a flat file (CSV files, for instance).  But at least they all do 
it in some way, and they all are able to export a table to, e.g. a csv file. 
Therefore, it becomes trivially easy to move data, even in the absence of a 
tool like DTS, by exporting the data to a CSV file and then reading that 
file into the target RDBMS.  This is something I've done countless times. 
But, not having found a way to easily determine the schema programmatically, 
so my perl or Java or C++ code can be generic enough to apply to any of my 
databases, I find myself taking an hour or two to write a simple script, 
usually in Perl, to handle each table in sequence.  There are, of course, 
things to watch, such as keeping the table creation statements in a 
different script than that which creates indices and especially foreign 
keys, so the tables are ready when the data is to be loaded, but the 
constraints are created after the data is loaded, purely for performance 
reasons.  I learned the hard way that loading data slows down dramatically 
if the database has to continually check constraints, so I create 
constraints after loading valid data, and leave them in place only while new 
data is to be stored.

I guess I am recommending that the schemas be done by hand, regardless of 
whether you're creating a new database or repairing or migrating an old one, 
even if the original developer wasn't considerate enough to create, or 
provide, an appropriate SQL script to create the database de novo.  it seems 
to me to be risky to assume that the tools we like now will still exist ten 
years from now.  Anyone remember IBM's OS/2? That was my favourite OS, and 
it was arguably better than any flavour of Windows available at the time. 
But even though I used it exclusively ten years ago, it is no longer a 
viable option for any machine I now use.  I am sure anyone reading this who 
is of my generation could produce a long list of products that they used and 
liked which no longer exist or which are now no longer commercially viable. 
C++ is still a good option for many kinds of application development, but I 
don't think there are very many people using Watcom's compiler to produce 
commercial applications.  SQL will likely exist ten years from now, but will 
our favourite RDBMS?  I won't make that prediction, but I think I can 
minimize the risk by creating SQL scripts that use the highest common factor 
across the RDBMS products I have at my disposal.  Therefore, I should be 
able to use my scripts, with minimal pain, regardless of what RDBMS products 
are available at that time.

I do much the same for my C++ coding.  I try to use the highest common 
factor in the language, as implemented by the suite of compilers I use, and 
keep the implementation specific stuff to a minimum in completely separate 
compilation units (and even in different directories).  I just see it as 
inevitable that significant refactoring will be required, especially when 
migrating from one suite of tools to another, so I plan an architecture for 
my code that should make it as easy as possible, and similarly, for moving 
data around, I developed my approach to be as easy and reliable as possible, 
even if it is not the most elegant or efficient.

Cheers,

Ted