Evandro's mailing lists (Please, don't send personal messages to this
address) wrote:
Yes it is exactly that. I will follow you advice and create a
abstraction layer for the data access that will return the sparse
dataset using the standard dataset as input.
There is just one thing I disagree you said it that the performance is
not good, right. However, it is practical! Nothing is easier and more
practical than keeping the sparse representation inside of the
database for my application.
I think that you misunderstand the problems that come from doing it that
way.
The basic issue is this: the point of creating a system using an RDBMS
is to separate your data from your application. This allows for various
methods of data mining later. If you are presenting information in your
DB this way, you are breaking that assumption and so you get very little
(if any) benefit from using PostgreSQL instead of something like BDB.
Database design is usually about concise and unambiguous representation
of data and normalization is a part of this. This allows various
applications or ad hoc queries to be able to draw against the data in
meaningful ways.
The traditional way of representing sparse data is to use a join.
CREATE TABLE bird (
id SERIAL PRIMARY KEY,
description TEXT,
bname TEXT);
CREATE TABLE bird_color (
bird_id INT REFERENCES bird (id),
color TEXT,
is_color BOOL,
UNIQUE (bird_id, color));
Now, you can select bname from bird where (select is_color from
bird_color where color = 'red' and bird_id = bird.id)
The point is not that this makes it easier or harder from the stand
point of your application but that it makes it easier to build secondary
apps against the same data set, and that it avoids various ambiguities
that could result from secondary data entry/analysis apps.
People have suggested using an array, and that would be possible as well
(undecided whether this breaks first normal form) but this will
certainly cause more headache with secondary apps. Please understand
that PostgreSQL is designed around a world where these secondary apps
inevitably get built while a lot of commerical, off the shelf software
assumes that they won't be. This is why many of us question the DB
design of these apps.
I hope that this helps explain some of the why's of this thread.
Best Wishes,
Chris Travers
Metatron Technology Consulting