Re: PostgreSQL Gotchas

Chris Travers <chris@xxxxxxxxxxxxxxxxxx> · Sun, 16 Oct 2005 13:11:04 -0700

Martijn van Oosterhout wrote:

On Sat, Oct 15, 2005 at 05:46:03PM -0700, Chris Travers wrote:

This is largely what I have been suggesting.  However, you still have 
two (small) problems that could be solved fairly easily I think.  Take 
Tom's example:

SELECT MAX(relpages) FROM pg_class;

Now, I am perfectly happy to require relpages and pg_class to be quoted 
(if you don't want to, there are always views....), however, MAX is also 
an identifier, which if I understand Tom correctly is also folded to 
lower case.  If you fold MAX to upper case, you may not find the 
aggregate.  I don't see an easy answer aside from having separate 
entries for "max" and "MAX" which are functionally the same.

I think you're certainly waving away a lot of potential issues with saying
that "as long as system identifiers are quoted we're fine". Tom's
example is just the tip of the iceberg. 

- consider functions like nextval/currval/substring. Some of these are
created and used internally too, should they always be quoted? Would
these functions all appear twice in the functions list?

I think I mentioned that the case issue is of particular importance for 
functions and aggregates because we don't want to type:

SELECT "max"("relpages") FROM "pg_class" and this is definitely not 
standard anyway.

I think any functions that are shipped with PostgreSQL should come in 
both an upper case and lower case form.  I am willing to commit to 
building these in the system catalogs.  Same with basic types (like 
"INT" and "int") for example.  Or another option would just be to lower 
case all reserved SQL words though this would lead to a wide range of 
possible issues.

- consider oid,xmin,xmax,etc are they system identifiers? You can't
provide two copies of those, since each column can only appear once in
the catalogs.

No you can't but if you need to refer to oid, xmin, and xmax from 
outside the backend, you can always quote them.   Besides if you really 
nead that level of compatibility, consider the following....

CREATE VIEW "PG_CLASS" AS
SELECT "oid" AS "OID", "relname" as "RELNAME", .....
FROM "pg_class";
and then create associated rules.  So yes, we can have two entries in 
the system catalogs if we need to because views are wonderful things 
:-)  The only real exception is with functions and agregates, and these 
are bigger issues.

BTW, not to split hairs here too much, but for those customers that need 
this now, I suppose one could always use this same trick.

CREATE SCHEMA "PUBLIC";
CREATE VIEW "PUBLIC"."TABLE1" AS
SELECT "field1" AS "FIELD1"....
FROM "public"."table1";
CREATE RULE irule AS ON INSERT INTO "TABLE1"
DO INSTEAD INSERT INTO "public"."table1" ("field1"...) VALUES 
(NEW."FIELD1"...);
etc....

Could be a lot of work, but it should be possible to write a Perl script 
to generate these for you :-)

Sounds like a project for me :-P

- consider pg_dump, psql and other such tools with intimate knowledge
of the catalogs, they would all need to updated.

If you need to create backward compatibility via views, my proposal 
would allow for that.  Also if this is a GUC variable, there is no 
reason it can't be set by the application where needed.  Also, as far as 
I am concerned, I would rather leave the system catalogs alone and 
change them as *little* as possible for these reasons.

- consider the number of third party apps that uses these "system
identifiers". Updating all of those is a massive task, for what gain?

See above regarding views.  The only difference between shipping it with 
this mode enabled is that it provides an easy means for customers to 
avoid the problem and we have a centralized solution.

There is some sympathy around here for the view that in this case the
standard is simply wrong. As said on this thread already, uppercase
only was appropriate for twenty years ago but not now. Older
programming languages tended to have their keywords in uppercase, yet
no modern language would consider such an act.

So, I think in the long run there is going to be some sympathy for
helping people with "QUOTEDUPPERCASE" identifiers to port their stuff
but I don't think PostgreSQL itself will ever default to folding up.

Who ever said anything about defaulting.  I simply said an option should 
be present.  FWIW I prefer the way PostgreSQL does it, but I have to 
recognize that it will bite people migrating from RDBMS's that do follow 
the standards.

Hence the discussion about folding quoted identifiers also, which
solves the issue without masses of changes everywhere.

Curiously, MySQL does as we do, although they don't accept standard
quoting at all (using backquotes). Do people complain about them too?

MySQL's unquoted table names being case sensitive is even more broken 
than ours ;-)  Besides if people using MySQL cared about standards, I 
don't think they would still be using MySQL...

It seems to me that the first step in any solution is going to be 
creating a tool that automatically generates views in appropriate schema 
using upper case (or maybe optionally lower case) table and column 
names.  This should be fairly simple to do.  Then such a tool could be 
used to provide whatever sort of compatibility we need.  Indeed this may 
in fact be more flexible in helping people migrate to 
BrokenSystemsLikeMySQL because it would allow you to create views off 
tables using lower case names as well.

A second step will be to ensure that the backend really does treat 
identifiers as already double-quoted.

However, this is only a partial fix.  It doesn't solve the larger issue 
of (albeit badly written but standards-compliant) queries like:

SELECT "MYCOLUMN" FROM MYTABLE;

My fundamental point is that the above statement means something 
specific according to the SQL standard, and it means something 
*different* in PostgreSQL.  The time may not be ripe for a solution but 
if it is at least on the TODO list, then maybe people will be less bit 
by it if and when we actually do support this behavior.

Best Wishes,
Chris Travers
Metatron Technology Consulting

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match