Hi, I am using int8 field to pack a number of error flags. This
is very common technique for large tables to pack multiple flags in one integer
field. For most records – the mt_flags field is 0. Here is
the statistics (taken from pgAdmin Statistics tab for mt_flags column): Most common Values: {0,128,2,4,8) Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029) What I notice that when bit-AND function is used –
Postgres significantly underestimates the amount of rows: explain analyze select count(*) from mt__20090801
where mt_flags&8=0;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=83054.43..83054.44 rows=1
width=0) (actual time=2883.154..2883.154 rows=1 loops=1) -> Seq Scan on mt__20090801
(cost=0.00..83023.93 rows=12200
width=0) (actual time=0.008..2100.390 rows=2439435
loops=1) Filter:
((mt_flags & 8) = 0) Total runtime: 2883.191 ms (4 rows) This is not an issue for the particular query above, but I
noticed that due to that miscalculation in many cases Postgres chooses plan
with Nested Loops for other queries. I can fix it by setting enable_nest_loops
to off, but it's not something I should set for all queries. Is there any way to help Postgres make a better estimation for
number of rows returned by bit function? Thanks, -Slava Moudry, Senior DW Engineer. 4Info Inc. P.S. table definition: \d
mt__20090801
Table "dw.mt__20090801"
Column
|
Type |
Modifiers --------------------------+-----------------------------+----------- mt_id
|
bigint
| not null mt_ts
| timestamp without time zone | ad_cost
|
numeric(10,5)
| short_code
|
integer
| message_id
|
bigint
| not null mp_code
|
character(1)
| not null al_id
|
integer
| not null cust_id
|
integer
| device_id
|
integer
| not null broker_id
|
smallint
| partner_id
|
integer
| ad_id
|
integer
| keyword_id
|
integer
| sc_id
|
integer
| cp_id
|
integer
| src_alertlog_id
|
bigint
| src_query_id
|
bigint
| src_response_message_num
|
smallint
| src_gateway_message_id
|
bigint
| mt_flags
| integer |
message_length
|
integer
| not null created_etl
| timestamp without time zone | Indexes:
"mt_device_id__20090801" btree (device_id) WITH (fillfactor=100),
tablespace "index2"
"mt_ts__20090801" btree (mt_ts) WITH (fillfactor=100) CLUSTER,
tablespace "index2" Check
constraints:
"mt__20090801_mt_ts_check" CHECK (mt_ts >= '2009-08-01
00:00:00'::timestamp without time zone AND mt_ts < '2009-08-02
00:00:00'::timestamp without time zone) Inherits:
mt Tablespace:
"dw_tables3" |