Search Postgresql Archives

Re: huge price database question..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/20/2012 04:27 PM, Jim Green wrote:
Greetings list!
I am pretty new to postgresql from mysql
Welcome.
I have daily minute stock price data from 2005 on and each day with
columns timestamp, open,high,low,close,volume and a few more. each
day's data is about 1.2million rows. I want import all the data to
postgresql and analyze using R with the help of Rpostgresql.

right now I am having about 7000 tables for individual stock and I use
perl to do inserts, it's very slow.
I'm not sure I understand - please expand and clarify.

If you have 7000 stocks, 1.2 million rows/day is only 171 observations/stock/day or a little under 3-hours of data at 60 samples/hour. Are there holes in the data or am I completely missing the setup?

For the NYSE you have 52-weeks/year, 5 trading-days/week (except for up to 8 trading holidays), 6.5 trading-hours/day giving 100,000 rows/stock/year give or take depending on holiday schedule. Over your time range, that will be less than 1-million rows per stock.

Without holes in the data, it seems you will be adding 2.7 million rows per trading day.

Is perl doing individual record inserts? Short of reconnecting for each insert, this is the slowest option. Each insert is its own transaction and, unless you are using a raid card with writeback enabled (and hopefully with battery-backed cache if it is) you will hit some limitations imposed by your disk's rotational speed. If you can ensure the data is sorted by stock, you can start a transaction, write all the records for that stock, then commit the transaction which should speed the operation.

Only because you are bulk adding historical data and can rebuild your database if it is destroyed, you could consider turning off fsync while you are importing historical data. Dropping indexes while you play catch-up can help as well.

  I would like to use copy or other
bulk loading tool to load the daily raw gz data. but I need the split
the file to per stock files first before I do bulk loading. I consider
this a bit messy.

I would seek advise on the following idea:
store everything in a big table, partition by month(this gives a
reasonable number of partitions) and do bulk loading on the daily
file. my queries would consist mostly select on a particular symbol on
a particular day.
If you already have daily files, you might want to work out the process of importing a day and apply that to your backlog. There are lots of options - you might try experimenting.

Given the type of queries you plan, simple partitioning by time period as you described is a reasonable approach.

You could import a day's data as a bulk copy then select from that table into the various individual stock tables. BTW, you have plenty of RAM that that daily bulk file will be cached and queries on each individual stock will probably be quite fast. But you will need to wrap things in a transaction or otherwise ensure that you can recover if things fail part-way through that distribution of data.

You could partition your data by stock symbol and use a trigger on the parent to put the data into the correct table on insert. I am unsure how fast this will run - try it and test. One benefit is that your daily import will fully succeed or fully fail. But you will need to update your partitioning and triggers to deal with new stocks. You can, of course, have the trigger choose which table to use based on the table name. This eliminates the need to alter the trigger code but it is still recommended to make the child-table in advance.


my hardware is 16G Ram, 4x5400rpm raid10 with enough space.
Have you started with basic tuning. It is unlikely that whatever stock PostgreSQL you have installed is suboptimal. (What PG version and OS are you using?)

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux