DB Encoding enforcement

Bosco Rama <postgres@xxxxxxxxxxxxx> · Fri, 22 Apr 2011 13:43:44 -0700

Hey folks,

Having not had to worry about character encoding in the past we
blithely used the SQL_ASCII encoding and had the application do
the input filtering.  We have reached the point where we would
like to have the DB enforce the character encoding for us.  We
have chosen to go with LATIN9 encoding with C collation and type
attributes.

We ended up with a PG 8.4.7 DB on Ubuntu Server 10.04 LTS that
looks like this:

psql -l
                              List of databases
   Name    |  Owner   | Encoding | Collation | Ctype |   Access privileges
-----------+----------+----------+-----------+-------+-----------------------
 enc_test  | postgres | LATIN9   | C         | C     |
 postgres  | postgres | LATIN9   | C         | C     |
 template0 | postgres | LATIN9   | C         | C     | =c/postgres
                                                     : postgres=CTc/postgres
 template1 | postgres | LATIN9   | C         | C     | =c/postgres
                                                     : postgres=CTc/postgres
(4 rows)

I then created a simple table in enc_test:
   create table xyz ( str text );

and tried inserting a text string with invalid characters for the
encoding which was (surprising to me) successful:
   insert into xyz values ( E'abc \342\200\223 cef' );
   INSERT 0 1

(this is a sample from our DB where a unicode sequence was able to
be inserted).  I know that \200 and \223 are not valid characters in
the LATIN9 character set.

My question is: Are we expecting too much or did we do something wrong?
I would have expected the insert to fail.

We wish to stick with single-byte characters (hence the LATIN9 encoding)
as we have legacy code that will fail with multi-byte characters (which
unfortunately precludes UTF-8 :-( ).

TIA.

Bosco.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general