Re: Bitmap scan is undercosted?

Vitaliy Garnashevich <vgarnashevich@xxxxxxxxx> · Sun, 3 Dec 2017 23:15:01 +0200



    On 02/12/2017 23:17, Jeff Janes wrote:

    
          Right, so there is a cpu costing
            problem (which could only be fixed by hacking postgresql and
            recompiling it), but it is much smaller of a problem than
            the IO cost not being accurate due to the high hit rate. 
            Fixing the CPU costing problem is unlikely to make a
            difference to your real query.  If you set the page costs to
            zero, what happens to your real query?
        
      
    I can't reproduce the exact issue on the real database any more. The
    query started to use the slow bitmap scan recently, and had been
    doing so for some time lately, but now it's switched back to use the
    index scan. The table involved in the query gets modified a lot. It
    has hundreds of millions of rows. Lots of new rows are appended to
    it every day, the oldest rows are sometimes removed. The table is
    analyzed at least daily. It's possible that statistics was updated
    and that caused the query to run differently. But I still would like
    to understand why that issue happened, and how to properly fix it,
    in case the issue returns.

    
              But I doubt that the
                settings seq_page_cost = random_page_cost = 0.0 should
                actually be used.
            
            
            Why not?  If your production server really has
              everything in memory during normal operation, that is the
              correct course of action.  If you ever restart the server,
              then you could have some unpleasant time getting it back
              up to speed again, but pg_prewarm could help with that.  

            
    In the real database, not everything is in memory. There are 200GB+
    of RAM, but DB is 500GB+. The table involved in the query itself is
    60GB+ of data and 100GB+ of indexes. I'm running the test case in a
    way where all reads are done from RAM, only to make it easier to
    reproduce and to avoid unrelated effects.

    
    As far as know, costs in Postgres were designed to be relative to
    seq_page_cost, which for that reason is usually defined as 1.0. Even
    if everything would be in RAM, accesses to the pages would still not
    have zero cost. Setting 0.0 just seems too extreme, as all other
    non-zero costs would become infinitely bigger.

    
          If you really want to target the plan
            with the BitmapAnd, you should increase 
            cpu_index_tuple_cost and/or cpu_operator_cost but not
            increase cpu_tuple_cost.  That is because the  unselective
            bitmap index scan does not incur any cpu_tuple_cost, but
            does incur index_tuple and operator costs.  Unfortunately
            all other index scans in the system will also be skewed by
            such a change if you make the change system-wide.
        
      
    Exactly. I'd like to understand why the worse plan is being chosen,
    and 1) if it's fixable by tuning costs, to figure out the right
    settings which could be used in production, 2) if there is a bug in
    Postgres optimizer, then to bring some attention to it, so that it's
    eventually fixed in one of future releases, 3) if Postgres is
    supposed to work this way, then at least I (and people who ever read
    this thread) would understand it better.

    
    Regards,

    Vitaliy