Re: Help understanding indexes, explain, and optimizing

Chris <dmagick@xxxxxxxxx> · Tue, 07 Mar 2006 11:40:19 +1100

i.v.r. wrote:
Hi everyone,

I'm experimenting with PostgreSQL, but since I'm no expert DBA, I'm 
experiencing some performance issues.

Please take a look at the following query:

SELECT
 /*groups."name" AS t2_r1,
 groups."id" AS t2_r3,
 groups."user_id" AS t2_r0,
 groups."pretty_url" AS t2_r2,
 locations."postal_code" AS t0_r6,
 locations."pretty_url" AS t0_r7,
 locations."id" AS t0_r8,
 locations."colony_id" AS t0_r0,
 locations."user_id" AS t0_r1,
 locations."group_id" AS t0_r2,
 locations."distinction" AS t0_r3,
 locations."street" AS t0_r4,
 locations."street_2" AS t0_r5,
 schools."updated" AS t1_r10,
 schools."level_id" AS t1_r4,
 schools."pretty_url" AS t1_r11,
 schools."user_id" AS t1_r5,
 schools."id" AS t1_r12,
 schools."type_id" AS t1_r6,
 schools."distinction" AS t1_r7,
 schools."cct" AS t1_r8,
 schools."created_on" AS t1_r9,
 schools."location_id" AS t1_r0,
 schools."service_id" AS t1_r1,
 schools."sustentation_id" AS t1_r2,
 schools."dependency_id" AS t1_r3*/
 groups.*,
 locations.*,
 schools.*
FROM locations
LEFT OUTER JOIN groups ON groups.id = locations.group_id
LEFT OUTER JOIN schools ON schools.location_id = locations.id
WHERE (colony_id = 71501)
ORDER BY groups.name, locations.distinction, schools.distinction

As you can see, I've commented out some parts. I did that as an 
experiment, and it improved the query by 2x. I really don't understand 
how is that possible... I also tried changing the second join to an 
INNER join, and that improves it a little bit also.

Anyway, the main culprit seems to be that second join. Here's the output 
from EXPLAIN:

Sort  (cost=94315.15..94318.02 rows=1149 width=852)
  Sort Key: groups.name, locations.distinction, schools.distinction
  ->  Merge Left Join  (cost=93091.96..94256.74 rows=1149 width=852)
        Merge Cond: ("outer".id = "inner".location_id)
        ->  Sort  (cost=4058.07..4060.94 rows=1148 width=646)
              Sort Key: locations.id
              ->  Hash Left Join  (cost=1.01..3999.72 rows=1148 width=646)
                    Hash Cond: ("outer".group_id = "inner".id)
                    ->  Index Scan using locations_colony_id on 
locations  (cost=0.00..3992.91 rows=1148 width=452)
                          Index Cond: (colony_id = 71501)
                    ->  Hash  (cost=1.01..1.01 rows=1 width=194)
                          ->  Seq Scan on groups  (cost=0.00..1.01 
rows=1 width=194)
        ->  Sort  (cost=89033.90..89607.67 rows=229510 width=206)
              Sort Key: schools.location_id
              ->  Seq Scan on schools  (cost=0.00..5478.10 rows=229510 
width=206)

I don't completely understand what that output means, but it would seem 
that the first join costs about 4000, but if I remove that join from the 
query, the performance difference is negligible. So as I said, it seems 
the problem is the join on the schools table.

I hope it's ok for me to post the relevant tables here, so here they are 
(I removed some constraints and indexes that aren't relevant to the 
query above):

CREATE TABLE groups
(
 user_id int4 NOT NULL,
 name varchar(50) NOT NULL,
 pretty_url varchar(50) NOT NULL,
 id serial NOT NULL,
 CONSTRAINT groups_pk PRIMARY KEY (id),
)

CREATE TABLE locations
(
 colony_id int4 NOT NULL,
 user_id int4 NOT NULL,
 group_id int4 NOT NULL,
 distinction varchar(60) NOT NULL,
 street varchar(60) NOT NULL,
 street_2 varchar(50) NOT NULL,
 postal_code varchar(5) NOT NULL,
 pretty_url varchar(60) NOT NULL,
 id serial NOT NULL,
 CONSTRAINT locations_pk PRIMARY KEY (id),
 CONSTRAINT colony FOREIGN KEY (colony_id)
     REFERENCES colonies (id) MATCH SIMPLE
     ON UPDATE NO ACTION ON DELETE NO ACTION,
 CONSTRAINT "group" FOREIGN KEY (group_id)
     REFERENCES groups (id) MATCH SIMPLE
     ON UPDATE NO ACTION ON DELETE NO ACTION,
)
CREATE INDEX locations_fki_colony
 ON locations
 USING btree
 (colony_id);
CREATE INDEX locations_fki_group
 ON locations
 USING btree
 (group_id);

CREATE TABLE schools
(
 location_id int4 NOT NULL,
 service_id int4 NOT NULL,
 sustentation_id int4 NOT NULL,
 dependency_id int4 NOT NULL,
 level_id int4 NOT NULL,
 user_id int4 NOT NULL,
 type_id int4 NOT NULL,
 distinction varchar(25) NOT NULL,
 cct varchar(20) NOT NULL,
 created_on timestamp(0) NOT NULL,
 updated timestamp(0),
 pretty_url varchar(25) NOT NULL,
 id serial NOT NULL,
 CONSTRAINT schools_pk PRIMARY KEY (id),
 CONSTRAINT "location" FOREIGN KEY (location_id)
     REFERENCES locations (id) MATCH SIMPLE
     ON UPDATE NO ACTION ON DELETE NO ACTION,
)
CREATE INDEX schools_fki_location
 ON schools
 USING btree
 (location_id);

So I'm wondering what I'm doing wrong. I migrated this database from 
MySQL, and on there it ran pretty fast.

Have you done an 'analyze' or 'vacuum analyze' over these tables?

A left outer join gets *everything* from the second table:

> LEFT OUTER JOIN groups ON groups.id = locations.group_id
> LEFT OUTER JOIN schools ON schools.location_id = locations.id

So they will load everything from groups and schools. Maybe they should 
be left join's not left outer joins?

--
Postgresql & php tutorials
http://www.designmagick.com/