Bad Planner Statistics for Uneven distribution.

"Kevin McArthur" <Kevin@xxxxxxxxxxxx> · Fri, 21 Jul 2006 12:15:28 -0600

I discussed this with a few members of #postgresql 
freenode this morning. I'll keep it breif; [note: i have cleaned out columns not 
relevant]

I have two tables, brands and models_brands. The 
first has about 300 records, the later about 350,000 records. The number of 
distinct brands in the models_brands table is < 10.

=# \d 
models_brands
        Table 
"public.models_brands"
 Column 
|         
Type          | 
Modifiers
--------+-----------------------+-----------
 model  | 
integer               
| not null
 brand  | 
integer               
| not null
Indexes:
    "models_brands_brand" btree 
(brand)
Foreign-key constraints:

"models_brands_brand_fkey" FOREIGN KEY (brand) REFERENCES brands(brand_id) ON 
UPDATE CASCADE ON DELETE CASCADE

"models_brands_model_fkey" FOREIGN KEY (model) REFERENCES models(model_id) ON 
UPDATE CASCADE ON DELETE CASCADE

a=# \d 
brands;

Table "public.brands"
   Column   
|          
Type          
|                         
Modifiers
------------+------------------------+-----------------------------------------------------------
 brand_id   
| 
integer                
| not null default nextval('brands_brand_id_seq'::regclass)
 brand_name 
| character varying(255) | not null
Indexes:

"brands_pkey" PRIMARY KEY, btree (brand_id)

Now the plans/problems..

=# set enable_seqscan to on;
SET
=# explain 
analyze select distinct brand from 
models_brands;

QUERY 
PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Unique  
(cost=46300.70..48148.15 rows=4 width=4) (actual time=3699.691..6215.216 rows=4 
loops=1)
   ->  Sort  (cost=46300.70..47224.43 
rows=369489 width=4) (actual time=3699.681..5027.069 rows=369489 
loops=1)
         Sort Key: 
brand
         ->  Seq Scan 
on models_brands  (cost=0.00..6411.89 rows=369489 width=4) (actual 
time=0.040..1352.997 rows=369489 loops=1)
 Total runtime: 6223.666 
ms
(5 rows)

=# set enable_seqscan to off;
SET
=# explain 
analyze select distinct brand from 
models_brands;

QUERY 
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  
(cost=0.00..863160.68 rows=4 width=4) (actual time=0.131..2584.779 rows=4 
loops=1)
   ->  Index Scan using models_brands_brand on 
models_brands  (cost=0.00..862236.96 rows=369489 width=4) (actual 
time=0.122..1440.809 rows=369489 loops=1)
 Total runtime: 2584.871 
ms
(3 rows)

Picks the wrong plan here. Should pick the index 
with seqscanning enabled.

More (as a different wording/query)... (as 
suggested by others on irc)

=# set enable_seqscan to on;
SET
=# explain 
analyze select brand_id from brands where exists (select 1 from models_brands 
where brand = 
brands.brand_id);

QUERY 
PLAN
----------------------------------------------------------------------------------------------------------------------------
 Seq 
Scan on brands  (cost=0.00..30.09 rows=152 width=4) (actual 
time=7742.460..62567.543 rows=4 loops=1)
   Filter: 
(subplan)
   SubPlan
     ->  Seq 
Scan on models_brands  (cost=0.00..7335.61 rows=92372 width=0) (actual 
time=206.467..206.467 rows=0 
loops=303)

Filter: (brand = $0)
 Total runtime: 62567.626 ms

a=# set enable_seqscan to off;
SET
=# explain analyze select brand_id from brands 
where exists (select 1 from models_brands where brand = 
brands.brand_id);

QUERY 
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq 
Scan on brands  (cost=100000000.00..100000715.90 rows=152 width=4) (actual 
time=0.615..3.710 rows=4 loops=1)
   Filter: 
(subplan)
   SubPlan
     ->  Index 
Scan using models_brands_brand on models_brands  (cost=0.00..216410.97 
rows=92372 width=0) (actual time=0.008..0.008 rows=0 
loops=303)
           Index 
Cond: (brand = $0)
 Total runtime: 3.790 ms

It was also tried to similar results with a LIMIT 1 
in the subquery for exist.

More...

Seqscan still off..

=# explain analyze select distinct brand_id from 
brands inner join models_brands on (brand_id = 
brand);

QUERY 
PLAN                                                                        

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  
(cost=0.00..867782.58 rows=303 width=4) (actual time=0.391..4898.579 rows=4 
loops=1)
   ->  Merge Join  (cost=0.00..866858.85 
rows=369489 width=4) (actual time=0.383..3749.771 rows=369489 
loops=1)
         Merge Cond: 
("outer".brand_id = 
"inner".brand)
         ->  
Index Scan using brands_pkey on brands  (cost=0.00..15.53 rows=303 width=4) 
(actual time=0.080..0.299 rows=60 
loops=1)
         ->  Index 
Scan using models_brands_brand on models_brands  (cost=0.00..862236.96 
rows=369489 width=4) (actual time=0.013..1403.175 rows=369489 
loops=1)
 Total runtime: 4898.697 ms

=# set enable_seqscan to on;
SET
=# explain 
analyze select distinct brand_id from brands inner join models_brands on 
(brand_id = 
brand);

QUERY 
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Unique  
(cost=46300.70..52770.04 rows=303 width=4) (actual time=3742.046..8560.833 
rows=4 loops=1)
   ->  Merge Join  
(cost=46300.70..51846.32 rows=369489 width=4) (actual time=3742.035..7406.677 
rows=369489 loops=1)
         Merge 
Cond: ("outer".brand_id = 
"inner".brand)
         ->  
Index Scan using brands_pkey on brands  (cost=0.00..15.53 rows=303 width=4) 
(actual time=0.077..0.407 rows=60 
loops=1)
         ->  
Sort  (cost=46300.70..47224.43 rows=369489 width=4) (actual 
time=3741.584..5051.348 rows=369489 
loops=1)

Sort Key: 
models_brands.brand

->  Seq Scan on models_brands  (cost=0.00..6411.89 rows=369489 
width=4) (actual time=0.027..1346.178 rows=369489 loops=1)
 Total 
runtime: 8589.502 ms
(8 rows)

Hope that helps

Kevin McArthur