number of rows estimation for bit-AND operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I am using int8 field to pack a number of error flags. This is very common technique for large tables to pack multiple flags in one integer field.

 

For most records – the mt_flags field is 0. Here is the statistics (taken from pgAdmin Statistics tab for mt_flags column):

Most common Values: {0,128,2,4,8)

Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)

 

What I notice that when bit-AND function is used – Postgres significantly underestimates the amount of rows:

 

 

explain analyze select count(*) from mt__20090801 where  mt_flags&8=0;

                                                         QUERY PLAN                                                         

-----------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=83054.43..83054.44 rows=1 width=0) (actual time=2883.154..2883.154 rows=1 loops=1)

   ->  Seq Scan on mt__20090801  (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.008..2100.390 rows=2439435 loops=1)

         Filter: ((mt_flags & 8) = 0)

 Total runtime: 2883.191 ms

(4 rows)

 

This is not an issue for the particular query above, but I noticed that due to that miscalculation in many cases Postgres chooses plan with Nested Loops for other queries. I can fix it by setting enable_nest_loops to off, but it's not something I should set for all queries.

Is there any way to help Postgres make a better estimation for number of rows returned by bit function?

Thanks,

-Slava Moudry, Senior DW Engineer. 4Info Inc.

 

P.S. table definition:

 

\d mt__20090801

                      Table "dw.mt__20090801"

          Column          |            Type             | Modifiers

--------------------------+-----------------------------+-----------

 mt_id                    | bigint                      | not null

 mt_ts                    | timestamp without time zone |

 ad_cost                  | numeric(10,5)               |

 short_code               | integer                     |

 message_id               | bigint                      | not null

 mp_code                  | character(1)                | not null

 al_id                    | integer                     | not null

 cust_id                  | integer                     |

 device_id                | integer                     | not null

 broker_id                | smallint                    |

 partner_id               | integer                     |

 ad_id                    | integer                     |

 keyword_id               | integer                     |

 sc_id                    | integer                     |

 cp_id                    | integer                     |

 src_alertlog_id          | bigint                      |

 src_query_id             | bigint                      |

 src_response_message_num | smallint                    |

 src_gateway_message_id   | bigint                      |

 mt_flags                 | integer                     |

 message_length           | integer                     | not null

 created_etl              | timestamp without time zone |

Indexes:

    "mt_device_id__20090801" btree (device_id) WITH (fillfactor=100), tablespace "index2"

    "mt_ts__20090801" btree (mt_ts) WITH (fillfactor=100) CLUSTER, tablespace "index2"

Check constraints:

    "mt__20090801_mt_ts_check" CHECK (mt_ts >= '2009-08-01 00:00:00'::timestamp without time zone AND mt_ts < '2009-08-02 00:00:00'::timestamp without time

zone)

Inherits: mt

Tablespace: "dw_tables3"

 


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux