Re: Slow join over three tables

Karl Czajkowski <karlcz@xxxxxxx> · Wed, 27 Apr 2016 08:15:55 -0700

On Apr 27, Tim van der Linden modulated:
...
> I'm joining three fairly large tables together, and it is slow. The tables are:
> 
> - "reports": 6 million rows
> - "report_drugs": 20 million rows
> - "report_adverses": 20 million rows
> 
...
> All tables have indexes on the "id"/"rid" columns and on the "drug"/"adverse" columns.
> 
> The query:
> 
> SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug
> FROM reports r
> JOIN report_drugs d ON d.rid = r.id
> JOIN report_adverses a ON a.rid = r.id 
> WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back pain', 'back pain']) 
> AND d.drug = ANY (ARRAY[359, 360, 361, 362, 363]) ORDER BY r.created;
> 

I would suggest a few experiments to see how you can modify the plans
available to the optimizer:

1. CREATE INDEX ON report_drugs (drug, rid)
2. CREATE INDEX ON report_adverses (adverse, rid)
3. CREATE INDEX ON report (id, created)

Re-run EXPLAIN ANALYZE of your query after each of these steps to see
how each one affects planning.  You might also try two variants of the
query at each step, with and without the ORDER BY.

Note, the index column order in (1) and (2) above is very important.

Karl

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general