--- Andreas Strasser <kontakt@xxxxxxxxxxxxxxxxxxxx> wrote: > Hello, > > i'm currently designing an application that will > retrieve economic data > (mainly time series)from different sources and > distribute it to clients. > It is supposed to manage around 20.000 different > series with differing > numbers of observations (some have only a few dozen > observations, others > several thousand) and i'm now faced with the > decision where and how to > store the data. If you really have such a disparity among your series, then it is a mistake to blend them into a single table. You really need to spend more time analyzing what the data means. If one data set is comprised of the daily close price of a suite of stocks or mutual funds, then it makes sense to include all such series in a given table, but if some of the series are daily close price and others are monthly averages, then it is a mistake to combine them in a single table, and two or more would be warranted. Or if the data are from different data feed vendors, then you have to think very carefully whether or not the data can logically be combined. > > So far, i've come up with 3 possible solutions > > 1) Storing the observations in one big table with > fields for the series, > position within the series and the value (float) > 2) Storing the observations in an array (either in > the same table as the > series or in an extra data-table) > 3) Storing the observations in CSV-files on the hard > disk and only > putting a reference to it in the database > I don't much like any of the above. When I have had to process data for financial consultants, I applied a few simple filters to ensure the data is clean (e.g. tests to ensure data hasn't been corrupted during transmission, proper handling of missing data, &c.), and then bulk loaded the data into a suite of tables designed specifically to match the vendor's definitions of what the data means. Only then did we apply specific analyses designed in consultation with the financial consultant's specialists; folk best qualified to help us understand how best to understand the data and especially how it can be combined in a meaningful way. If the data are stored in a suite of well defined tables, subsequent analyses are much more easily designed, implemented and executed. I do not know if PostgreSQL, or any other RDBMS, includes the ability to call on software such as "R" to do specific statistical analysis, but if I had to do some time series analysis, I would do it in a client application that retrieves the appropriate data from the database and either does the analysis in custom code I have written (usually in C++, as some of my favourite analyses have not made it into commonly available open source or commercial statistical software) or invokes the appropriate functions from statistical software I have at my disposal. The strategy I describe above makes the SQL required for much of this dirt simple. HTH Ted ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your message can get through to the mailing list cleanly