Re: varchar vs. text + constraint/triggers was: Help request to improve function performance

Karsten Hilbert <Karsten.Hilbert@xxxxxxx> · Thu, 23 Apr 2009 13:57:46 +0200

On Thu, Apr 23, 2009 at 01:21:05PM +0200, Ivan Sergio Borgonovo wrote:

> I'll try to rephrase to check if I understood and for reference.
> 
> varchar is slower than text since it has to do some "data type
> check".

Yes but no. It is said to be slower because it has to do a
data length check, not a data *type* check. Oh, did you mean
"check inherent to the data type" ?

> text is faster
... than varchar(something)

> but if you add a check... it gets slower
... than itself without the check, yes

> (slower than varchar?, faster?).
subject to testing

> constraint and trigger should have the advantage that in case of
> refactoring you're not going to touch the table definition
as far as the data types are constrained, the "core" table
definition so to speak

> that *may* end in being faster.
Yes, but I wasn't concerned about faster with respect to
which method for constraints only about which datatype
(which distinction, in the case of TEXT vs VARCHAR, is
somewhat arbitrary).

> But... if in postgresql implementation varchar is just text with a
> check... how can a change in type be faster?

A change of column datatype rewrites the table (again, I am
not 100% sure whether this applies for VARCHAR(x) <->
VARCHAR(y) and VARCHAR(x) <-> TEXT) since they are
essentially the same type with or without a check).

> On the other end... you're preferring text just because they have
> the same implementation (modulo check) in Postgresql... but it may
> not be so in other DB.
> So *maybe* other DB do some optimization on varchar vs. text.
True enough but I wasn't talking about those. Seref asked
about implementing archetypes on PostgreSQL.

> Nothing can handle strings of infinite length, and much before
> reaching infinite I'll get in trouble.
> People read differently what you'd like to say writing varchar(N).

> Most people read:
> 1) we expect a length around N
> Fewer people read:
> 2) There is something not working if we get something larger than N

VARCHAR(N)

VAR  - variable something
CHAR - characters

-> so, likely, variable *number* of characters because
   it better store variable characters ;-)

(N) - some boundary condition

so, either:

- exactly N (but, then, why *VAR*char ?)
- at least N (huh ?, but, well)
- at most N

> But it may also mean:
> 3) if we get something larger than N something is going to explode
> I think the same "ambiguity" is carried by check().
> Anyway for a sufficiently large N 2) and 3) can be valid.

No doubt.

> Supposing the cost of loosing an insert for an unpredicted large
> value of N is high I'd be tempted to set N to at least protect me
> from 3) but I bet people may interpret it as 1).
If you want PostgreSQL to help protect you from the risk of
out-of-memory error, then, yes, it can help a tiny bit to
use VARCHAR(N) where N = "reasonable" (due to earlier
warning) instead of letting PG go to the limits with TEXT.
Agreed.

> In my experience anyway varchar is a good early warning for troubles
> and the risk of being misunderstood/get caught by implementation
> dependent gotcha writing varchar(N) where N mean 3) largely
> encompass the risk of loosing an insert you didn't have to lose.
I see. That's surely a valid point of view.

> Maybe I've spotted a potential advantage of check over varchar.
> If you use some kind of convention to name checks you could
> remove/re-apply them easier than spotting varchars().
> The name of the constraint may contain metadata to help you.
> The name of the constraint may also suggest why it's there to your
> colleagues.
> But this works just if your implementation perform similarly on text
> over varchar().
No, the self-documentation advantage is there regardless of
performance. But the choice is a tradeoff either way, that's
for sure.

I think we may have gotten to a point where we won't help
the OP much :-)

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general