Events tables, model discussion in regards to the performances

"Marc Mamin" <M.Mamin@xxxxxxxxxxxx> · Mon, 11 Dec 2006 13:39:50 +0100

Title: Events tables, model discussion in regards to the performances

Hallo,

I have a large amount of time based events to aggregate, with a finite list of possible events.

the events are logged one at the time:

timestamp_1 : event_1

timestamp_2 : event_2

timestamp_3 : event_1

...

My idea is to prepare the data before to import them in order to get one separate column per event

along with the desired time aggregation

timestamp : houroffset : is_event_1 :is_event_2 : is_event_3

t1                 132         True        null         null

t2                 132         null        True         null

t3                 133         True        null         null

t4                 134         null        null         True

....

With this model,  I can easily count my events per time periode:

select 

  count(is_event_1) as C1,

  count(is_event_2) as C2,

  count(is_event_3) as C3

from foo

group by houroffset.

(the real model is more compex as they are different categories and volatile properties associated with the events)

and now my questions:

---------------------

- Will the above query acces the "is_event columns", or get the informmation only from the nulls bitmap within the row headers ?

- How does the splitting of the "event" information affect the row headers ?  Should  I rather define a single event column and aggregate my data using "case when else end" clauses ?

- is a where clause "where is_event_x is not null" more performant than "where is_event_x is true" as the null bitmap can thoretically be used.

- in my model, I couldn't find a way to build an index that could be used to query a single event:

   select ...

   from foo

   where is_event_1=true (|| is_event_1 is not null)

  I first imagined an index like create index foo_i on foo ((is_event_1 IS NOT NULL),  (is_event_2  IS NOT NULL),...

  But it is apparently not used  when I have a single event in my where clause.

  A solution would be to add a further column in my table containing the event_id and to index it:

  timestamp : houroffset : is_event_1 :is_event_2 : is_event_3 : event_id

  t1                 132         True        null         null          1

  t2                 132         null        True         null          2

  t3                 133         True        null         null          1

  t4                 134         null        null         True          3

  but this extra column would be redundant with the nulls bitmap. Is there a way to avoid this duplication of information ?

- is there a performance gain at all when columns that are seldom used are placed at the end of the rows ?

- I 'll also be thankful for any comments and critics on my model.  

cheers,

Marc