Hallo,
I have a large amount of time based events to aggregate, with a finite list of possible events.
the events are logged one at the time:
timestamp_1 : event_1
timestamp_2 : event_2
timestamp_3 : event_1
...
My idea is to prepare the data before to import them in order to get one separate column per event
along with the desired time aggregation
timestamp : houroffset : is_event_1 :is_event_2 : is_event_3
t1 132 True null null
t2 132 null True null
t3 133 True null null
t4 134 null null True
....
With this model, I can easily count my events per time periode:
select
count(is_event_1) as C1,
count(is_event_2) as C2,
count(is_event_3) as C3
from foo
group by houroffset.
(the real model is more compex as they are different categories and volatile properties associated with the events)
and now my questions:
---------------------
- Will the above query acces the "is_event columns", or get the informmation only from the nulls bitmap within the row headers ?
- How does the splitting of the "event" information affect the row headers ? Should I rather define a single event column and aggregate my data using "case when else end" clauses ?
- is a where clause "where is_event_x is not null" more performant than "where is_event_x is true" as the null bitmap can thoretically be used.
- in my model, I couldn't find a way to build an index that could be used to query a single event:
select ...
from foo
where is_event_1=true (|| is_event_1 is not null)
I first imagined an index like create index foo_i on foo ((is_event_1 IS NOT NULL), (is_event_2 IS NOT NULL),...
But it is apparently not used when I have a single event in my where clause.
A solution would be to add a further column in my table containing the event_id and to index it:
timestamp : houroffset : is_event_1 :is_event_2 : is_event_3 : event_id
t1 132 True null null 1
t2 132 null True null 2
t3 133 True null null 1
t4 134 null null True 3
but this extra column would be redundant with the nulls bitmap. Is there a way to avoid this duplication of information ?
- is there a performance gain at all when columns that are seldom used are placed at the end of the rows ?
- I 'll also be thankful for any comments and critics on my model.
cheers,
Marc