On Thursday 05 January 2006 15:12, Qingqing Zhou wrote: > "Mark Liberman" <mliberman@xxxxxxxxxxxxxxxx> wrote > > > First run, after a night of inactivity: > > > > -> Bitmap Index Scan on > > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0) > > (actual time=313.468..313.468 rows=11082 > > loops=1) > > Index Cond: (file_id = 137271) > > Total runtime: 313.643 ms > > > > Second run, after that: > > > > -> Bitmap Index Scan on > > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0) > > (actual time=2.106..2.106 rows=11082 loops=1) > > Index Cond: (file_id = 137271) > > Total runtime: 2.276 ms > > It is clear that the first query takes longer time because of the IO time > of index 1min_events_file_id_begin_idx (see 313.468 vs. 2.106). I am afraid > currently there is no easy solution for this situation, unless you could > predicate which part of relation/index your query will use, then you can > preload or "warm-up" cache for it. > > Regards, > Qingqing Thanks Qingqing, this actually helped me determine that the compound index, 1min_events_file_id_begin_idx, is not the proper index to use as it is based on file_id and begin_time - the later of which is not involved in the where clause. It is only using that index to "filter" out the listed file_id. Now, my follow-up question / assumption. I am assuming that the IO time is so long on that index because it has to read the entire index (for that file_id) into memory (because it cannot just scan the rows with a certain date range because we are not using begin_time in the where clause). But, if I replaced that compound index with the proper compound index of file_id / end_time, it would give similar performance results to the scan on 1min_events_end_idx (which was < 1 ms). E.g. the latest rows that were updated are more likely to be in the cache - and it is smart enough to only read the index rows that it needs. Alternatively, I could create a single index on file_id (and rely upon the new bitmap scan capabilities in 1.2). But, I fear that, although this will be smaller than the erroneous compound index on file_id / begin_time, it will still display the same behavior in that it will need to read all rows from that index for the appropriate file_id - and since the data goes back every minute for 60 days, that IO might be large. Obviously, I will be testing this - but it might take a few days, as I haven't figure out how to simulate the "period of inactivity" to get the data flushed out of the cache ... so I have to run this each morning. But, any confirmation / corrections to my assumptions are greatly appreciated. E.g. is the compound index the way to go, or the solo index on file_id? Thanks, Mark