Ted Byers wrote:
If you really have such a disparity among your series,
then it is a mistake to blend them into a single
table. You really need to spend more time analyzing
what the data means. If one data set is comprised of
the daily close price of a suite of stocks or mutual
funds, then it makes sense to include all such series
in a given table, but if some of the series are daily
close price and others are monthly averages, then it
is a mistake to combine them in a single table, and
two or more would be warranted. Or if the data are
from different data feed vendors, then you have to
think very carefully whether or not the data can
logically be combined.
> I don't much like any of the above. When I have had
> to process data for financial consultants, I applied a
> few simple filters to ensure the data is clean (e.g.
> tests to ensure data hasn't been corrupted during
> transmission, proper handling of missing data, &c.),
> and then bulk loaded the data into a suite of tables
> designed specifically to match the vendor's
> definitions of what the data means. Only then did we
> apply specific analyses designed in consultation with
> the financial consultant's specialists; folk best
> qualified to help us understand how best to understand
> the data and especially how it can be combined in a
> meaningful way.
Thank you for your long and detailed answer. It would be possible for me
to group those series by frequency or vendor (or even by size), but this
would make sql-queries really complicated. Grouping the series by theme
would probably work but be a real pain and i doubt whether this has a
real advantage for my application.
The real analysis of the data will happen on the client-side, the server
has only to ensure that the time series are up to date and not corrupted
- it doesn't care about the meaning of the value of any given field (I
was probably a little bit unclear about that). It will (with the help of
R - via PL/R) do a few simple calculations and transformations, but
those can be applied without much knowledge about the properties of the
data.
I do not know if PostgreSQL, or any other RDBMS,
includes the ability to call on software such as "R"
to do specific statistical analysis, but if I had to
do some time series analysis, I would do it in a
client application that retrieves the appropriate data
from the database and either does the analysis in
custom code I have written (usually in C++, as some of
my favourite analyses have not made it into commonly
available open source or commercial statistical
software) or invokes the appropriate functions from
statistical software I have at my disposal. The
strategy I describe above makes the SQL required for
much of this dirt simple.
Exactly, i'm looking for a simple and clean way to store data(possibly
transform its frequency) and serve it to the clients (in different
formats).
A typical request would be "Give me quarterly GDP and quarterly Import
figures for the US between 1987-01 to 2007-2 plus the effective federal
funds rate (transformed to quarterly averages) for the same period".
The server therefore has to grab the values for the first two series
(both are much longer - go back to the 1930's or 40's), extract the
values for the requested period (pad it if it's too short). It will do
the same for the third series and transform it to quarterly values (only
available for months/weeks). Finally it will return a table with a
date-column and 3 columns for the requested data.
Most of the stuff will be done in stored procedures, since i don't want
to mess around with it in php (which acts as as proxy for the clients).
Andreas
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org/