On Wed, 27 Apr 2016 13:48:06 +0200 Alban Hertroys <haramrae@xxxxxxxxx> wrote: > In this case, you're using the values in adverse to filter relevant rid's for the FK join, so you might be better off with the inverse of above index: > create index on report_adverses (adverse, rid); > create index on report_drugs (drug, rid); Hmmmm, like I reported yesterday this achieved a huge performance boost. However, two (new) things I like to touch on while further experimenting with this query: 1. Altering Drug IDs or Adverse names effects the executing time negatively. In this example altering the drug IDs I search for makes the query 6 times slower again: SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug FROM reports r JOIN report_drugs d ON d.rid = r.id JOIN report_adverses a ON a.rid = r.id WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back pain', 'back pain']) AND d.drug = ANY (ARRAY[9557, 17848, 17880, 18223]) ORDER BY r.created; Different drug ID ([9557, 17848, 17880, 18223]), but the rest is the same. Query plan: Sort (cost=31409.71..31417.48 rows=3107 width=41) (actual time=662.707..662.819 rows=4076 loops=1) Sort Key: r.created Sort Method: quicksort Memory: 415kB -> Nested Loop (cost=25693.17..31229.48 rows=3107 width=41) (actual time=71.748..661.743 rows=4076 loops=1) -> Merge Join (cost=25692.61..26795.53 rows=3107 width=29) (actual time=70.841..97.094 rows=4076 loops=1) Merge Cond: (d.rid = a.rid) -> Sort (cost=16332.08..16671.61 rows=135810 width=8) (actual time=48.946..58.623 rows=135370 loops=1) Sort Key: d.rid Sort Method: quicksort Memory: 12498kB -> Index Scan using report_drugs_drug_idx on report_drugs d (cost=0.44..4753.44 rows=135810 width=8) (actual time=0.681..28.441 rows=135532 loops=1) Index Cond: (drug = ANY ('{9557,17848,17880,18223}'::integer[])) -> Sort (cost=9360.53..9556.94 rows=78565 width=21) (actual time=21.880..25.969 rows=77163 loops=1) Sort Key: a.rid Sort Method: quicksort Memory: 6682kB -> Index Only Scan using report_adverses_adverse_rid_idx on report_adverses a (cost=0.56..2972.57 rows=78565 width=21) (actual time=0.983..10.744 rows=76972 loops=1) Index Cond: (adverse = ANY ('{"back pain - nonspecific","nonspecific back pain","back pain"}'::text[])) Heap Fetches: 0 -> Index Only Scan using reports_id_age_gender_created_idx on reports r (cost=0.56..1.42 rows=1 width=20) (actual time=0.137..0.138 rows=1 loops=4076) Index Cond: (id = d.rid) Heap Fetches: 0 Planning time: 16.438 ms Execution time: 663.583 ms A total of 660 ms instead of the previous 120 ms. The amount of rows returned are 4076, instead of the 448 rows of the previous query. Could this be the cause of the slow down? Or is it the fact that the integers in the ANY clause are further apart from one another and thus make for longer index searches? The same happens with changing the adverses, some adverses come back in ~120 ms, others can take up to one second. 2. Adding extra columns ignores indexes This one was kind of expected. Adding extra columns from both "report_drugs" and "report_adverses" slows the lot down again. Extra columns added: - a.recovery - a.severity - d.reason - d.effectiveness - d.duration As I would expect, adding these columns would make the previous multi-column indexes useless. And they indeed were not used anymore. So my initial thought was to create new indexes on the columns I now query: - CREATE INDEX ON report_adverses(adverse, rid, severity, recovery); - CREATE INDEX ON report_drugs(drug, rid, reason, effectiveness, duration); After running the query again, the new index on "report_adverses" got picked up, but the index on "report_drugs" did not: Sort (cost=12365.79..12366.61 rows=329 width=76) (actual time=129.106..129.120 rows=448 loops=1) Sort Key: r.created Sort Method: quicksort Memory: 66kB -> Nested Loop (cost=11212.93..12352.04 rows=329 width=76) (actual time=31.558..128.951 rows=448 loops=1) Join Filter: (d.rid = r.id) -> Merge Join (cost=11212.38..11680.44 rows=329 width=64) (actual time=30.705..39.527 rows=448 loops=1) Merge Cond: (d.rid = a.rid) -> Sort (cost=1503.85..1539.82 rows=14390 width=35) (actual time=6.977..7.993 rows=14074 loops=1) Sort Key: d.rid Sort Method: quicksort Memory: 1202kB -> Index Scan using report_drugs_drug_idx on report_drugs d (cost=0.44..510.02 rows=14390 width=35) (actual time=0.567..4.638 rows=14200 loops=1) Index Cond: (drug = ANY ('{359,360,361,362,363}'::integer[])) -> Sort (cost=9708.53..9904.94 rows=78565 width=29) (actual time=23.717..26.540 rows=76974 loops=1) Sort Key: a.rid Sort Method: quicksort Memory: 6702kB -> Index Only Scan using report_adverses_adverse_rid_severity_recovery_idx on report_adverses a (cost=0.56..3320.57 rows=78565 width=29) (actual time=0.878..12.297 rows=76972 loops=1) Index Cond: (adverse = ANY ('{"back pain - nonspecific","nonspecific back pain","back pain"}'::text[])) Heap Fetches: 0 -> Index Only Scan using reports_id_age_gender_created_idx on reports r (cost=0.56..2.03 rows=1 width=20) (actual time=0.198..0.199 rows=1 loops=448) Index Cond: (id = a.rid) Heap Fetches: 0 Planning time: 18.310 ms Execution time: 129.483 ms Still fast enough ... but I was wondering why the planner would not use the new index and instead fall back on the "report_drugs_drug_idx" single column index. > Alban Hertroys > -- > If you can't see the forest for the trees, > cut the trees and you'll find there is no forest. Cheers, Tim -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general