Re: More then 1600 columns?

"Clark C. Evans" <cce@xxxxxxxxxxxxxx> · Fri, 12 Nov 2010 15:24:36 -0500

On Fri, 12 Nov 2010, Tom Lane wrote:
> Generally, wanting more than a few dozen columns is a 
> good sign that you need to rethink your schema design.
> What are you trying to accomplish exactly?

Generally speaking, yes.  However, survey instruments are a 
very legitimate design where this is not only correct, but
quite desirable.  Sure -- in an ideal world you might be 
able to turn your instrument into something that is a highly 
normalized structure. However, it's simply not practical.

PostgreSQL absolutely should look into supporting this.  While
the storage may not be ideal, it is an increasingly common case.
Perhaps the storage engine could permit a set of columns to be
tagged for ``sparse column storage``?

On Fri, 12 Nov 2010, Mark Michell wrote:
> Yes I understand that this is "bad design" but what we are doing is
> storing each form field in a survey in its own column. For very long
> surveys we end up with thousands of elements. 

It's *not* necessarily bad design.  If you've got a survey 
instrument that has 1000 unique data points, it is a reasonable
thing to do.  We (Prometheus Research) have these sorts of 
things all the times. 

#1  You want to store and query them based on specific fields.

#2  You want to enforce the data integrity of each column
    by using types, enums and other techniques.

#3  You want to integrate this structure with the other
    aspects of your database.

> I know storing in an array is possible but it makes it
> so much easier to query the data set when each element
> is in its own field.  comments on why I should not do this

Using HSTORE or using an EAV table completely undermines
the ability to ensure column-level typing, constraints,
and... use SQL.  With arrays / EAV table structure, you 
end up having to re-invent your query language. It is a
non-solution.

> the possible alternatives

The work-around we use is to break our instrument into
one table per ``section``.  We also save each section as the
user completes that part of the instrument.  This has the
advantage of allowing you to model repeating sections.  We
typically enforce column uniqueness across all sections.
This requires your application and query tool to be a bit 
smart about making a virtual table with all 1000+ columns. 
Kinda sucks, but it works.

...

I guess the other option is to look at a RDF data store where
you use SPARQL to query the data.  OpenJena SDB is implemented
as a 3-table storage in PostgreSQL if you wish to keep the same
database for both your assessment storage and other tables. I'm 
not lovin' this option, but it's on the table for our company.

Instead, I'd absolutely love to see PostgreSQL add a variant
scalar type (for when the type of a column changes over the
lifetime of the instrument) and a column storage for Nth+
and additional columns.  For now, the work-around above sort
of works for us.  

Best,

Clark

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general