I discussed this with a few members of #postgresql
freenode this morning. I'll keep it breif; [note: i have cleaned out columns not
relevant]
I have two tables, brands and models_brands. The
first has about 300 records, the later about 350,000 records. The number of
distinct brands in the models_brands table is < 10.
=# \d
models_brands
Table "public.models_brands" Column | Type | Modifiers --------+-----------------------+----------- model | integer | not null brand | integer | not null Indexes: "models_brands_brand" btree (brand) Foreign-key constraints: "models_brands_brand_fkey" FOREIGN KEY (brand) REFERENCES brands(brand_id) ON UPDATE CASCADE ON DELETE CASCADE "models_brands_model_fkey" FOREIGN KEY (model) REFERENCES models(model_id) ON UPDATE CASCADE ON DELETE CASCADE a=# \d
brands;
Table "public.brands" Column | Type | Modifiers ------------+------------------------+----------------------------------------------------------- brand_id | integer | not null default nextval('brands_brand_id_seq'::regclass) brand_name | character varying(255) | not null Indexes: "brands_pkey" PRIMARY KEY, btree (brand_id) Now the plans/problems..
=# set enable_seqscan to on;
SET =# explain analyze select distinct brand from models_brands; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Unique (cost=46300.70..48148.15 rows=4 width=4) (actual time=3699.691..6215.216 rows=4 loops=1) -> Sort (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3699.681..5027.069 rows=369489 loops=1) Sort Key: brand -> Seq Scan on models_brands (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.040..1352.997 rows=369489 loops=1) Total runtime: 6223.666 ms (5 rows) =# set enable_seqscan to off;
SET =# explain analyze select distinct brand from models_brands; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=0.00..863160.68 rows=4 width=4) (actual time=0.131..2584.779 rows=4 loops=1) -> Index Scan using models_brands_brand on models_brands (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.122..1440.809 rows=369489 loops=1) Total runtime: 2584.871 ms (3 rows) Picks the wrong plan here. Should pick the index
with seqscanning enabled.
More (as a different wording/query)... (as
suggested by others on irc)
=# set enable_seqscan to on;
SET =# explain analyze select brand_id from brands where exists (select 1 from models_brands where brand = brands.brand_id); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Seq Scan on brands (cost=0.00..30.09 rows=152 width=4) (actual time=7742.460..62567.543 rows=4 loops=1) Filter: (subplan) SubPlan -> Seq Scan on models_brands (cost=0.00..7335.61 rows=92372 width=0) (actual time=206.467..206.467 rows=0 loops=303) Filter: (brand = $0) Total runtime: 62567.626 ms a=# set enable_seqscan to off;
SET =# explain analyze select brand_id from brands
where exists (select 1 from models_brands where brand =
brands.brand_id);
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Seq Scan on brands (cost=100000000.00..100000715.90 rows=152 width=4) (actual time=0.615..3.710 rows=4 loops=1) Filter: (subplan) SubPlan -> Index Scan using models_brands_brand on models_brands (cost=0.00..216410.97 rows=92372 width=0) (actual time=0.008..0.008 rows=0 loops=303) Index Cond: (brand = $0) Total runtime: 3.790 ms It was also tried to similar results with a LIMIT 1
in the subquery for exist.
More...
Seqscan still off..
=# explain analyze select distinct brand_id from
brands inner join models_brands on (brand_id =
brand);
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=0.00..867782.58 rows=303 width=4) (actual time=0.391..4898.579 rows=4 loops=1) -> Merge Join (cost=0.00..866858.85 rows=369489 width=4) (actual time=0.383..3749.771 rows=369489 loops=1) Merge Cond: ("outer".brand_id = "inner".brand) -> Index Scan using brands_pkey on brands (cost=0.00..15.53 rows=303 width=4) (actual time=0.080..0.299 rows=60 loops=1) -> Index Scan using models_brands_brand on models_brands (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.013..1403.175 rows=369489 loops=1) Total runtime: 4898.697 ms =# set enable_seqscan to on;
SET =# explain analyze select distinct brand_id from brands inner join models_brands on (brand_id = brand); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=46300.70..52770.04 rows=303 width=4) (actual time=3742.046..8560.833 rows=4 loops=1) -> Merge Join (cost=46300.70..51846.32 rows=369489 width=4) (actual time=3742.035..7406.677 rows=369489 loops=1) Merge Cond: ("outer".brand_id = "inner".brand) -> Index Scan using brands_pkey on brands (cost=0.00..15.53 rows=303 width=4) (actual time=0.077..0.407 rows=60 loops=1) -> Sort (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3741.584..5051.348 rows=369489 loops=1) Sort Key: models_brands.brand -> Seq Scan on models_brands (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.027..1346.178 rows=369489 loops=1) Total runtime: 8589.502 ms (8 rows) Hope that helps
Kevin McArthur
|