Re: Help in avoiding a query 'Warm-Up' period/shared buffer cache

Mark Liberman <mliberman@xxxxxxxxxxxxxxxx> · Thu, 5 Jan 2006 18:15:36 -0800

On Thursday 05 January 2006 15:12, Qingqing Zhou wrote:
> "Mark Liberman" <mliberman@xxxxxxxxxxxxxxxx> wrote
>
> > First run, after a night of inactivity:
> >
> >                     ->  Bitmap Index Scan on
> > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
> > (actual time=313.468..313.468 rows=11082
> > loops=1)
> >                           Index Cond: (file_id = 137271)
> > Total runtime: 313.643 ms
> >
> > Second run, after that:
> >
> >                     ->  Bitmap Index Scan on
> > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
> > (actual time=2.106..2.106 rows=11082 loops=1)
> >                           Index Cond: (file_id = 137271)
> > Total runtime: 2.276 ms
>
> It is clear that the first query takes longer time because of the IO time
> of index 1min_events_file_id_begin_idx (see 313.468 vs. 2.106). I am afraid
> currently there is no easy solution for this situation, unless you could
> predicate which part of relation/index your query will use, then you can
> preload or "warm-up" cache for it.
>
> Regards,
> Qingqing

Thanks Qingqing, 

this actually helped me determine that the compound index, 
1min_events_file_id_begin_idx, is not the proper index to use as it is based 
on file_id and begin_time - the later of which is not involved in the where 
clause.  It is only using that index to "filter" out the listed file_id.  

Now, my follow-up question / assumption.  I am assuming that the  IO time is 
so long on that index because it has to read the entire index (for that 
file_id) into memory (because it cannot just scan the rows with a certain 
date range because we are not using begin_time in the where clause). 

But, if I replaced that compound index with the proper compound index of 
file_id / end_time, it would give similar performance results to the scan on 
1min_events_end_idx (which was < 1 ms).  E.g. the latest rows that were 
updated are more likely to be in the cache - and it is smart enough to only 
read the index rows that it needs.

Alternatively, I could create a single index on file_id (and rely upon the new 
bitmap scan capabilities in 1.2).  But, I fear that, although this will be 
smaller than the erroneous compound index on file_id / begin_time, it will 
still display the same behavior in that it will need to read all rows from 
that index for the appropriate file_id - and since the data goes back every 
minute for 60 days, that IO might be large.

Obviously, I will be testing this - but it might take a few days, as I haven't 
figure out how to simulate the "period of inactivity" to get the data flushed 
out of the cache ... so I have to run this each morning.  But, any 
confirmation / corrections to my assumptions are greatly appreciated. E.g. is 
the compound index the way to go, or the solo index on file_id?

Thanks,

Mark