Re: Beyond the 1600 columns limit on windows

"John D. Burger" <john@xxxxxxxxx> · Tue, 8 Nov 2005 14:14:58 -0500

Evandro's mailing lists (Please, don't send personal messages to this 
address) wrote:

It has nothing to do with normalisation.  It is a program for 
scientific applications.
Data values are broken into column to allow multiple linear regression 
and multivariate regression trees computations.

Having done similar things in the past, I wonder if your current DB 
design includes a column for every feature-value combination:

instanceID  color=red  color=blue  color=yellow  ...  height=71  
height=72
-------------------------------------------------
42           True      	False       False
43           False     True        False
44           False     False       True
...

This is likely to be extremely sparse, and you might use a sparse 
representation accordingly.  As several folks have suggested, the 
representation in the database needn't be the same as in your code.

Even SPSS the most well-known statistic sw uses the same approach and 
data structure that my software uses.
Probably I should use another data structure but would not be as 
eficient and practical as the one I use now.

The point is that, if you want to use Postgres, this is not in fact 
efficient and practical.  In fact, it might be the case that mapping 
from a sparse DB representation to your internal data structures is 
=more= efficient than naively using the same representation in both 
places.

- John D. Burger
  MITRE

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org