Re: Are bitmap index scans slow to start?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 21, 2013 at 8:57 AM, Carlo Stonebanks <stonec.register@xxxxxxxxxxxx> wrote:
(Sorry moderators for any double posts, I keep making subscription errors.
Hopefully this one gets through)

Hi speed freaks,

Can anyone tell me why the bitmap heap scan takes so long to start for this
query? (SQL and EXPLAIN ANALYZE follows).

It is probably reading data from disk.  explain (analyze, buffers) would be more helpful, especially if track_io_timing were also turned on.  In the absence of that, my thoughts below are just best-guesses.
 

The big culprit in this appears to be:
->  Bitmap Index Scan on log_2013_01_session_idx  (cost=0.00..63186.52
rows=2947664 width=0) (actual time=32611.918..32611.918 rows=2772042
loops=1)"
Index Cond: (session_id = 27)"

I can't see anything that occurs between actual time 0.0..32611.918 that
this could be waiting on. Is it building the bitmap?

Yes.  More importantly, it is reading the index data needed in order to build the bitmap.
 

Running the query a second time yields this:

->  Bitmap Index Scan on log_2013_01_session_idx  (cost=0.00..63186.52
rows=2947664 width=0) (actual time=2896.601..2896.601 rows=2772042 loops=1)"
Index Cond: (session_id = 27)"

Does the bitmap then get cached?

No, the bitmap itself doesn't get cached.  But the data needed to construct the bitmap does get cached.  It gets cached by the generic caching methods of PG and the OS, not through something specific to bitmaps.
 
These queries are built dynamically and
called rarely, so their first-time performance is important.

What is going on during the interregnum?  Whatever it is, it seems to be driving the log_2013_01_session_idx index out of the cache, but not the log_2013_01 table.  (Or perhaps the table visit is getting the benefit of effective_io_concurrency?)

Rebuilding the index might help, as it would put all the leaf pages holding values for session_id=27 adjacent to each other, so they would read from disk faster.  But with a name like "session_id", I don't know how long such clustering would last though.

If I'm right about the index disk-read time, then switching to a plain index scan rather than a bitmap index scan would make no difference--either way the data has to come off the disk. 


 
I'd prefer a
strategy that allowed fast performance the first time, rather than slow the
first time and extremely fast subsequently.


So would PG, but it can't find such a strategy.  PG optimizes all top-level queries in isolation, it never assumes you will execute the same query repeatedly and build that assumption into the costing process.  (This is not true of subqueries, where it does account for repeated executions in the cost)
 
Cheers,

Jeff

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux