At 12/19/2006 11:01 PM, Ashley M. Kirchner wrote:
I'm starting to log weather data to a database and I'm trying to
figure out what's the best way to create the tables. The reports
are coming in every minute, of every hour, 24 hours a
day. Eventually, I'd like to do some calculations on the
statistics, displaying daily values (which can be broken down to
hourly), but then also daily and monthly averages.
To me, it doesn't make sense to dump everything into one big
table, but I can't figure out what's the best way to break it down either.
Keep in mind that the only data I have, is what comes in for that
minute. The daily averages I have to calculate myself (later.) But
I can't see one large table being very effective when it comes to
calculating that stuff.
So, how should I break the tables down? Create a new table
every day (20061219_data, 20061220_data, etc.) and insert all the
values in it? Or, break it down per values (temp_table,
humidity_table, etc.) and insert daily data in them?
(This question doesn't pertain to PHP but to database techniques; you
may get better and more friendly advice on a MySQL list.)
I'm curious, why doesn't it make sense to you to keep all the data in
one big table? MySQL is certainly robust enough to keep a whack of
data together. Only when table size becomes problem, say with the
practicality of backup or the speed of queries or the size of the
hard drive, do you need to worry about breaking it down into smaller
chunks. But every database has its limits and you're smart to decide
up front how to split it up.
A major factor in how you choose to store your data should be how it
will be used. What kinds of queries will be most common? What
time-spans do they cover? Do they usually interrogate just one
parameter, e.g. either temperature or humidity but not both, or do
they often query two or more parameters in search of correlations?
Without knowing more, my first tendency would be to keep all the data
in a single table. One table would actually occupy less disk space
than splitting the data into parallel tables because some fields
would need to be duplicated in every table (timestamp, record id,
perhaps location, etc.). I might choose to split the data into one
table per year for ease of backup and archiving. Another approach is
to allow up to 5 or 10 years of data accumulate in a single table,
then archive (copy out & delete) the oldest year's data every year to
keep the table size manageable.
The daily averages I have to calculate myself (later.) But I can't
see one large table being very effective when it comes to
calculating that stuff.
I believe it will be more efficient to calculate averages from a
single table than from multiple tables. In both cases the database
engine has to select the same fields to calculate the averages, but
if the data is split into separate tables the engine will have to
select from each table separately before compiling them.
Regards,
Paul
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php