where to divide application and database

Sam Mason <sam@xxxxxxxxxxxxx> · Thu, 19 Feb 2009 23:43:19 +0000

I was just reading over a reply from David Fetter from a couple of days
ago; the thread is archived[1] but this question doesn't really relate
to it much.  The a question about how to arrange tables and David make
the following comments:

On Tue, Feb 17, 2009 at 09:53:00AM -0800, David Fetter wrote:
> On Tue, Feb 17, 2009 at 04:40:58PM +0000, Sam Mason wrote:
> > >      user_name varchar(50) NOT NULL,
> > 
> > As a general design question; should user_name have a UNIQUE
> > constraint on it?  i.e.
> > 
> >   user_name VARCHAR(50) NOT NULL UNIQUE,
> 
> Yes, it's good to have a UNIQUE constraint, but not this one.  To have
> a sane one, it needs further constraints, and in 8.4, case-insensitive
> text (citext) type.  Here's one that is reasonably sane until citext
> is available.
> 
> user_name TEXT, -- unless length is an integrity constraint, use TEXT instead of VARCHAR.
> 
> then later:
> 
> CREATE UNIQUE INDEX unique_user_name_your_table
>     ON your_table(LOWER(TRIM(user_name)))
> 
> You might also require that whitespace be treated in some consistent
> way, one example of which is simply forbidding whitespace in user_name
> at all.  This you can do via CHECK constraints or a DOMAIN.

The reason behind this appears to be moving some of the checks into the
database and away from the application.

When I've solved similar problems before, I've tended to make the
application more aware of what's going on by having something like:

  user_name VARCHAR(50) NOT NULL UNIQUE
    CHECK (user_name ~ '^[a-z][a-z0-9_]*$')

I explicitly don't want enormous long usernames, 15 characters should be
enough but lets give people something to play with if they want.  It's
partly in case I want to ever get the code interacting with something
like C and want to reduce my exposure to buffer overflows as much as
possible (yes, I'll still be writing code that should be safe from
buffer overflows but I'm human and bugs occur) and there's also the fact
that a 1MB username is going to probably be copied around the place
pretty freely because the code isn't expecting it to be big.  Then
there's layout issues, displaying a bit of text that long is awkward.

Next thing, just a plain UNIQUE constraint.  It allows me to then use
the table as the target of a REFERENCES constraint if I want.  If/when
the optimizer knows that UNIQUE constraints mean that only a single row
is returned then it'll be able to optimize things better as well.

Finally the CHECK constraint is filtering out "bad" usernames, I don't
want people embedding HTML or whatever else in their username to break
my systems so plain text only here and no spaces at the end.

I don't think that either my nor David's is better in general, they
apply to different situations.  It's just interesting to see how
different people solve problems so I was wondering if other people do
things differently.

-- 
  Sam  http://samason.me.uk/

 [1] http://archives.postgresql.org/pgsql-general/2009-02/msg00770.php

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general