Re: Are bitmap index scans slow to start?

Gavin Flower <GavinFlower@xxxxxxxxxxxxxxxxx> · Sat, 23 Feb 2013 09:07:20 +1300



    On 23/02/13 08:05, Nikolas Everett
      wrote:

    
      I can't really help, but I can make it more clear
        why postgres is choosing a _bitmap_ index scan rather than a
        regular index scan.  With a regular index scan it pumps the
        index for the locations of the rows that it points to and loads
        those rows as it finds them.  This works great if the rows in
        the index are sorta sorted - that way it isn't jumping around
        the table randomly.  Random io is slow.  In a bitmap index scan
        pg pumps the index and buffers the by shoving them in a big
        bitmap.  Then, it walks the bitmap in order to produce in order
        io.  PG makes the choice based on a measure of the index's
        correlation.
        
          
        The problem comes down to you inserting the
          sessions concurrently with one another.  My instinct would be
          to lower the FILLFACTOR on newly created indecies so they can
          keep their entries more in order.  I'm not sure why I have
          that instinct but it feels right.  Also, you might could try
          clustering newly created tables on session_id and setting the
          fillfactor down so rows with the same session id will stick
          together on disk.
        

        Now that I look stuff up on the internet I'm not
          sure where I saw that pg tries to maintain a cluster using
          empty space from FILLFACTOR but I _think_ it does.  I'm not
          sure what is going on with my google foo today.
        

        Nik
      
      
        On Fri, Feb 22, 2013 at 12:50 PM, Carlo
          Stonebanks <stonec.register@xxxxxxxxxxxx>
          wrote:

          
                A
                    cool idea, but if I understand it correctly very
                    specific and fussy. New DB’s are spawned on this
                    model, and all the developers would have to be aware
                    of this non-standard behaviour, and DBA”s would have
                    to create these indexes every month, for every DB
                    (as the log tables are created every month). There
                    are 89 session_id values in the January log
                    (log_2013_01) so this would quickly get out of
                    control. But – like I said – an interesting idea for
                    more specific challenges.
                 
                
                    From: Marc Mamin [mailto:M.Mamin@xxxxxxxxxxxx] 

                        Sent: February 21, 2013 2:41 PM

                        To: Jeff Janes; Carlo Stonebanks

                        Cc: pgsql-performance@xxxxxxxxxxxxxx

                        Subject: AW:  Are bitmap index
                        scans slow to start?
                  
                
                            >Rebuilding
                                the index might help, as it would put
                                all the leaf pages holding values for
                                session_id=27 adjacent to each other, so
                                they would read from disk faster.  But
                                with a name like >"session_id", I
                                don't know how long such clustering
                                would last though.

                                
                                >If I'm right about the index
                                disk-read time, then switching to a
                                plain index scan rather than a bitmap
                                index scan would make no
                                difference--either way the data has to
                                come off the disk.  

                                
                            >>I'd prefer a

                                >>strategy that allowed fast
                                performance the first time, rather than
                                slow the

                                >>first time and extremely fast
                                subsequently.
                          
                        
                          Hello,

                              
                              if the index is only used to locate rows
                              for single session_id, you may consider
                              split it in a set of partial indexes.

                              
                              e.g. 

                              create index i_0 on foo where session_id%4
                              =0;

                              create index i_1 on foo where session_id%4
                              =1;

                              create index i_2 on foo where session_id%4
                              =2;

                              create index i_3 on foo where session_id%4
                              =3;

                              
                              (can be built in parallel using separate
                              threads)

                              
                              Then you will have to ensure that all your
                              WHERE clauses also contain the index
                              condition:

                              
                              WHERE session_id = 27 AND session_id%4
                              =27%4

                              
                              regards,

                              
                              Marc Mamin
                        
                      
    Could you use CLUSTER on the table after it had been
      closed off?  If appropriate, that should make the queries run much
      faster, as elated entries will be in the same or
        nearby blocks on disk.