Re: varchar vs. text + constraint/triggers was: Help request to improve function performance

Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> · Thu, 23 Apr 2009 13:21:05 +0200

On Thu, 23 Apr 2009 12:00:30 +0200
Karsten Hilbert <Karsten.Hilbert@xxxxxxx> wrote:

> On Thu, Apr 23, 2009 at 11:33:34AM +0200, Ivan Sergio Borgonovo
> wrote:
> 
> > Karsten Hilbert <Karsten.Hilbert@xxxxxxx> wrote:
> > 
> > > On Thu, Apr 23, 2009 at 12:02:13AM +0100, Seref Arikan wrote:
> > > 
> > > > I have a set of dynamically composed objects represented in
> > > > Java, with string values for various attributes, which have
> > > > variable length. In case you have suggestions for a better
> > > > type for this case, it would be my pleasure to hear about
> > > > them.
> > > 
> > > Seref, he's suggesting you use TEXT instead of
> > > VARCHAR(something). In PG it's actually usually *less*
> > > overhead to use the unbounded text datatype (no length check
> > > required).
> > > 
> > > Length checks mandated by business logic can be added by
> > > more dynamic means -- check constraints, triggers, etc which
> > > allow for less invasive change if needed.
> > 
> > Could you point us to some example of a constraint/trigger
> > (etc...) that is going to provide the same checking of varchar
> > and explain (if the code/example... doesn't make it clear) why
> > it should be faster or less invasive?
> 
> check constraint based:
> 
> 	create table foo (
> 		txt text
> 			check (char_length(txt) < 1001)
> 	);
> 
> trigger based:
> 
> 	create function trf_check_length_1000() ... returns
> trigger ... $$...$$;
> 
> 	create table foo (
> 		txt text
> 	);
> 
> 	create trigger check_txt_length before INSERT or
> UPDATE ... execute trf_check_length_1000();
> 
> faster:
> 
> - TEXT is (judging by previous comments on this list)
>   marginally faster than VARCHAR(1000) because a) it runs
>   the same code but b) doesn't have to check for the 1000
>   length
> 
> - other options (VARCHAR, constraint, trigger) incur
>   additional overhead and are thus slower
> 
> less invasive:
> 
> Well, poor wording on my part, perhaps. What I meant is that
> changing a check constraint or trigger appears to be a less
> costly operation on a table than changing the datatype of a
> column (although I seem to remember there being some
> optimizations in place for the case of changing the *length*
> of a varchar).

I'll try to rephrase to check if I understood and for reference.

varchar is slower than text since it has to do some "data type
check".

text is faster but if you add a check... it gets slower (slower than
varchar?, faster?).

constraint and trigger should have the advantage that in case of
refactoring you're not going to touch the table definition that
*may* end in being faster.

But... if in postgresql implementation varchar is just text with a
check... how can a change in type be faster?
If it was a char(N) maybe there would be some kind of optimization
since the length of the data is known in advance... so
shrinking/enlarging a char(N) may have a different cost than
shrinking a varchar(N) that in pg *should* have the same
implementation than text.

On the other end... you're preferring text just because they have
the same implementation (modulo check) in Postgresql... but it may
not be so in other DB.
So *maybe* other DB do some optimization on varchar vs. text.

Somehow I like the idea of considering a varchar a text with a
check, but I think I prefer the "more traditional" approach since
somehow is the "most expected".

Nothing can handle strings of infinite length, and much before
reaching infinite I'll get in trouble.
People read differently what you'd like to say writing varchar(N).
Most people read:
1) we expect a length around N
Fewer people read:
2) There is something not working if we get something larger than N
But it may also mean:
3) if we get something larger than N something is going to explode
I think the same "ambiguity" is carried by check().
Anyway for a sufficiently large N 2) and 3) can be valid.

Supposing the cost of loosing an insert for an unpredicted large
value of N is high I'd be tempted to set N to at least protect me
from 3) but I bet people may interpret it as 1).

In my experience anyway varchar is a good early warning for troubles
and the risk of being misunderstood/get caught by implementation
dependent gotcha writing varchar(N) where N mean 3) largely
encompass the risk of loosing an insert you didn't have to lose.

Maybe I've spotted a potential advantage of check over varchar.
If you use some kind of convention to name checks you could
remove/re-apply them easier than spotting varchars().
The name of the constraint may contain metadata to help you.
The name of the constraint may also suggest why it's there to your
colleagues.
But this works just if your implementation perform similarly on text
over varchar().

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general