Re: huge price database question..

David Kerr <dmk@xxxxxxxxxxxxxx> · Tue, 20 Mar 2012 19:03:20 -0700

On 03/20/2012 06:50 PM, Jim Green wrote:
On 20 March 2012 21:40, David Kerr<dmk@xxxxxxxxxxxxxx>  wrote:
On 03/20/2012 04:27 PM, Jim Green wrote:

Greetings list!
I am pretty new to postgresql from mysql and did a fairly extensive
search of the list and came up with a few good ones but didn't find
the exact same situation as I have now. so I am venturing asking here.

I have daily minute stock price data from 2005 on and each day with
columns timestamp, open,high,low,close,volume and a few more. each
day's data is about 1.2million rows. I want import all the data to
postgresql and analyze using R with the help of Rpostgresql.

right now I am having about 7000 tables for individual stock and I use
perl to do inserts, it's very slow. I would like to use copy or other
bulk loading tool to load the daily raw gz data. but I need the split
the file to per stock files first before I do bulk loading. I consider
this a bit messy.

I would seek advise on the following idea:
store everything in a big table, partition by month(this gives a
reasonable number of partitions) and do bulk loading on the daily
file. my queries would consist mostly select on a particular symbol on
a particular day.

Also in the future, I will import daily data to the db every day.

my hardware is 16G Ram, 4x5400rpm raid10 with enough space.

Thanks!

Jim.

Seems like you'd want to do this?
http://search.cpan.org/~turnstep/DBD-Pg-2.19.2/Pg.pm#pg_putcopydata
COPY support

DBD::Pg allows for quick (bulk) reading and storing of data by using the
COPY command. The basic process is to use $dbh->do to issue a COPY command,
and then to either add rows using "pg_putcopydata", or to read them by using
"pg_getcopydata".
Thanks! would you comment on the table setup as well?

Jim.

\copy on 1.2million rows should only take a minute or two, you could 
make that table "unlogged"
as well to speed it up more.  If you could truncate / drop / create / 
load / then index the table each
time then you'll get the best throughput.

Dave

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general