Re: Beyond the 1600 columns limit on windows

Chris Travers <chris@xxxxxxxxxxxxxxxxxx> · Wed, 23 Nov 2005 12:15:52 -0800

Evandro's mailing lists (Please, don't send personal messages to this 
address) wrote:

Yes it is exactly that. I will follow you advice and create a 
abstraction layer for the data access that will return the sparse 
dataset using the standard dataset as input.

There is just one thing I disagree you said it that the performance is 
not good, right. However, it is practical! Nothing is easier and more 
practical than keeping the sparse representation inside of the 
database for my application.

I think that you misunderstand the problems that come from doing it that 
way.

The basic issue is this:  the point of creating a system using an RDBMS 
is to separate your data from your application.  This allows for various 
methods of data mining later.  If you are presenting information in your 
DB this way, you are breaking that assumption and so you get very little 
(if any) benefit from using PostgreSQL instead of something like BDB.  
Database design is usually about concise and unambiguous representation 
of data and normalization is a part of this.  This allows various 
applications or ad hoc queries to be able to draw against the data in 
meaningful ways.

The traditional way of representing sparse data is to use a join.

CREATE TABLE bird (
id SERIAL PRIMARY KEY,
description TEXT,
bname TEXT);

CREATE TABLE bird_color (
bird_id INT REFERENCES bird (id),
color TEXT,
is_color BOOL,
UNIQUE (bird_id, color));

Now, you can select bname from bird where (select is_color from 
bird_color where color = 'red' and bird_id = bird.id)

The point is not that this makes it easier or harder from the stand 
point of your application but that it makes it easier to build secondary 
apps against the same data set, and that it avoids various ambiguities 
that could result from secondary data entry/analysis apps.

People have suggested using an array, and that would be possible as well 
(undecided whether this breaks first normal form) but this will 
certainly cause more headache with secondary apps.  Please understand 
that PostgreSQL is designed around a world where these secondary apps 
inevitably get built while a lot of commerical, off the shelf software 
assumes that they won't be.  This is why many of us question the DB 
design of these apps.

I hope that this helps explain some of the why's of this thread.

Best Wishes,
Chris Travers
Metatron Technology Consulting