We currently have a database table that's laid out something like this:
id int
date1 date
belongs_to date
type varchar(1)
type_fk int
start_time time
end_time time
location_fk int
department_fk int
value decimal
Where each row represents some data throughout the day (96 data points for each 15-minute period) - and each "type_fk", department, and location can have up to say, 3 rows for a given start / end time and date (based on the "type").
This table has rapidly grown - we're adding about 1 - 2 million rows per month - and almost all of our queries actually sum up the values based on the belongs_to date and the location_id, however, for other statistics we need to keep the values separate. The db is now more than 60% of our database, and we want to come up with a better way to store it. (To speed up other queries, we actually roll this table up into a daily table).
We're considering changing the structure of this table into one of the following structures:
Option [A]:
id int
date1 date
belongs_to date
type_fk int
location_fk int
department_fk int
value_type1_0 decimal
....
value_type1_96 decimal
value_type2_0 decimal
....
value_type2_96 decimal
value_type3_0 decimal
....
value_type3_96 decimal
or, as an alternative:
Option [B]:
id int
date1 date
belongs_to date
type varchar(1)
type_fk int
location_fk int
department_fk int
value_type_0 decimal
....
value_type_96 decimal
We're having a hard time choosing between the two options. We'll definitely partition either one by the date or belongs_to column to speed up the queries.
Option A would mean that any given date would only have a single row, with all three "types". However, this table would have 6+96*3 columns, and in many cases at least 96 of those columns would be empty. More often than not, however, at least half of the columns would be empty (most location's aren't open all day).
Option B would only create rows if the type had data in it, but the other 6 columns would be redundant. Again, many of the columns might be empty.
... From a space / size perspective, which option is a better choice?
How does PostgreSQL handle storing empty columns?
Thanks!
--
Anthony