Evandro's mailing lists (Please, don't send personal messages to this
address) wrote:
It has nothing to do with normalisation. It is a program for
scientific applications.
Data values are broken into column to allow multiple linear regression
and multivariate regression trees computations.
Having done similar things in the past, I wonder if your current DB
design includes a column for every feature-value combination:
instanceID color=red color=blue color=yellow ... height=71
height=72
-------------------------------------------------
42 True False False
43 False True False
44 False False True
...
This is likely to be extremely sparse, and you might use a sparse
representation accordingly. As several folks have suggested, the
representation in the database needn't be the same as in your code.
Even SPSS the most well-known statistic sw uses the same approach and
data structure that my software uses.
Probably I should use another data structure but would not be as
eficient and practical as the one I use now.
The point is that, if you want to use Postgres, this is not in fact
efficient and practical. In fact, it might be the case that mapping
from a sparse DB representation to your internal data structures is
=more= efficient than naively using the same representation in both
places.
- John D. Burger
MITRE
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org