On Apr 17, 2015 8:35 AM, "Kynn Jones" <kynnjo@xxxxxxxxx> wrote:
> (The only reason for wanting to transfer this data to a Pg table
> is the hope that it will be easier to work with it by using SQL
800 million 8-byte numbers doesn't seem totally unreasonable for python/R/Matlab, if you have a lot of memory. Are you sure you want it in Postgres? Load the file once then filter it as you like. If you don't have the memory I can see how using Postgres to get fewer rows at a time might help. Fewer columns at a time would help even more if that's possible.
> In its simplest form, this would mean using
> doubles as primary keys, but this seems to me a bit weird.
I'd avoid that and just include an integer PK with your data. Datagrams in the languages above support that, or just slice off the PK column before doing your matrix math.
Also instead of 401 columns per row maybe store all 400 doubles in an array column? Not sure if that's useful for you but maybe it's worth considering.
Also if you put the metadata in the same table as the doubles, can you leave off the PKs altogether? Why join if you don't have to? It sounds like the tables are 1-to-1? Even if some metadata is not, maybe you can finesse it with hstore/arrays.
Good luck!
Paul