Re: Thoughts on user-defined types for talk at Postgres conference?

Stephen Frost <sfrost@xxxxxxxxxxx> · Wed, 6 Mar 2024 12:45:33 -0500

Greetings,

* Guyren Howe (guyren@xxxxxxxxx) wrote:
> But what *really* sets Postgres apart from comparable systems is user defined types. I would like to carefully lay out how to define and use a user-defined type (I don’t think I have time to dig into doing fancy stuff with C functions, so just the basic “user defined sum type”), but also any gotchas.

What I take it you're referring to here are most specifically called
"Composite Types", as discussed here:

https://www.postgresql.org/docs/current/sql-createtype.html

> And I’d like to finish with some thoughts about when and how to use user-defined types. My feeling is that this feature is greatly under-used, mostly because it’s so non-standard. But AFAICT, user-defined types are fine and other than some ugliness due to SQL (mainly needing parentheses in some unexpected places), fields in a user defined type work perfectly well in Postgres’s SQL. I guess you’d need to pull them apart for values returned to clients, but that isn’t difficult.

Yeah, it's a bit grotty when you're trying to reference them (as you
note with the comment about the extra parentheses) but it's also a bit
painful building them up to supply as input (though not impossible).

> So, any gotchas with user defined types? Any thoughts about designing with them?

The biggest gotcha (which is a bit different from the usability issues,
which we discuss above) from my perspective is the serialization of
composite types- we use up a lot more space to store a composite type
that looks like:

CREATE TYPE twoint AS (a int, b int);

then if we create a table as:

CREATE TABLE t1 (a int, b int);

Let's show this, follow the above two commands, do:

CREATE TABLE t2 (c1 twoint);

INSERT INTO t1 VALUES (1,2);
INSERT INTO t2 VALUES ('(1,2)');

=*# select pg_column_size(a) from t1;
 pg_column_size 
----------------
              4
(1 row)

=*# select pg_column_size(b) from t1;
 pg_column_size 
----------------
              4
(1 row)

=*# select pg_column_size(a) from t2;
 pg_column_size 
----------------
             29
(1 row)

We can see it for the whole row too:

=*# select pg_column_size(t1.*) from t1;
 pg_column_size 
----------------
             32
(1 row)

=*# select pg_column_size(t2.*) from t2;
 pg_column_size 
----------------
             53
(1 row)

That's an additional 21 bytes, which is really quite a lot.  What's
included in those 21 bytes are the length (since it's now a
variable-length column to PG, unlike the case with the individual int
columns in the table where we know from the table structure the length
of them), the type information (typmod if there is one and the OID of
the composite type), some other info, and then the actual data of the
two int values inside of the composite type.

I've often wondered if there's a way to reduce this overhead, as I do
think that plus some improvements on the usability side would go a long
way to making composite types more appealing to users.  Still, they are
certainly handy in some instances, I just can't recommend heavy use of
them for large data sets where size is particularly important (such as
in heavy OLTP environments) due to their overhead.

Thanks!

Stephen
Attachment:
signature.asc

Description: PGP signature