Search Postgresql Archives

Re: JOIN column maximum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Lee Hachadoorian wrote:
On 01/05/2012 06:18 PM, Tom Lane wrote:
Are there really 23000 populated values in each row?  I hesitate to
suggest an EAV approach, but it kinda seems like you need to go in that
direction.  You're never going to get decent performance out of a schema
that requires 100-way joins, even if you avoid bumping up against hard
limits.
Many of the smaller geographies, e.g. census tracts, do in fact have data for the vast majority of the columns. I am trying to combine it all into one table to avoid the slowness of multiple JOINs (even though in practice I'm never joining all the tables at once). EAV sounds correct in terms of normalization, but isn't it usually better performance-wise to store write-once/read-many data in a denormalized (i.e. flattened) fashion? One of these days I'll have to try to benchmark some different approaches, but for now planning on using array columns, with each "sequence" (in the Census sense, not the Postgres sense) of 200+ variables in its own array rather than its own table.

EAV is not necessarily more correct than what you're doing.

The most correct solution is one where your database schema defines, and the DBMS enforces, all of the constraints or business rules on your data, so that you can not put something in the database that violates the business rules.

Traditional EAV, if you're talking about the common binary table of unconstrained field-name,field-value pairs, is not an improvement.

A more correct solution is to use different columns for things with different business rules or data types. If the DBMS can't handle this then that is grounds for improving the DBMS.

There's no reason that joins *have* to be slow, and in some DBMS designs you can join in linear time, its all about how you implement.

This all being said, 23K values per row just sounds wrong, and I can't imagine any census forms having that many details.

Do you, by chance, have multiple values of the same type that are in different fields, eg telephone_1, telephone_2 or child_1, child_2 etc? You should take any of those and collect them into array-typed fields, or separate tables with just telephone or child columns. Or do you say have a set of coordinates in separate fields? Or you may have other kinds of redundancy within single rows that are best normalized into separate rows.

With 23K values, these probably have many mutual associations, and you could split that table into a bunch of other ones where columns that relate more closely together are collected.

What I said in the last couple paragraphs is probably your earliest best thing to fix, so you both have a better design and it performs together on the DBMS you have.

-- Darren Duncan

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux