I need some advice on storing/retrieving data in large rows. Invariably
someone points out that very long rows are probably poorly normalized,
but I have to deal with how to store a dataset which cannot be changed,
specifically the ~23,000 column US Census American Community Survey.
The Census releases these data in 117 "sequences" of < 256 columns (in
order to be read by spreadsheet applications with a 256 column limit). I
have previously stored each sequence in its own table, which is pretty
straightforward.
My problem is that some of the demographic researchers I work with want
a one-table dump of the entire dataset. This would primarily be for data
transfer. This is of limited actual use in analysis, but nonetheless,
that's what we want to be able to do.
Now, I can't join all the sequences in one SQL query for export because
of the 1600 column limit. So based on previous list activity (Tom Lane:
Perhaps you could collapse multiple similar columns into an array
column?
http://archives.postgresql.org/pgsql-admin/2008-05/msg00211.php), I
decided to try to combine all the sequences into one table using array
columns. (This would actually make querying easier since the users
wouldn't have to constantly JOIN the sequences in their queries.) Next
problem: I run into the 8k row size limit once about half the columns
are populated. As far as I can understand, even though a row
theoretically supports a 1.6TB (!) row size, this only works for
TOASTable data types (primarily text?). The vast majority of the 23k
columns I'm storing are bigint.
Questions:
1) Is there any way to solve problem 1, which is to export the 23k
columns from the database as it is, with 117 linked tables?
2) Is there any way to store the data all in one row? If numeric types
are un-TOASTable, 23k columns will necessarily break the 8k limit even
if they were all smallint, correct?
Regards,
--Lee
--
Lee Hachadoorian
PhD, Earth& Environmental Sciences (Geography)
Research Associate, CUNY Center for Urban Research
http://freecity.commons.gc.cuny.edu
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general