On Wed, 27 Apr 2016 14:09:04 +1200 David Rowley <david.rowley@xxxxxxxxxxxxxxx> wrote: Hi David Thanks for your time on this. I tried your proposals with the results below. > > ... > > Under 5 ms. The same goes for querying the "adverse" column in the "report_adverses" table: under 20 ms. > > I'm not sure why you're comparing this to the join plan above. They're > very different, I can only imagine that it' because you've not quite > understood what the EXPLAIN output is saying. Well, it is more due to me poorly wording the question I guess. I understand that both plans are totally different and both queries are fairly incomparable as two very different things are happening. My point here was to show that the "drug" column seems to be indexed correctly and, even though it is a very large table, the query executes very fast (all these execution times are given after flushing the OS's disk cache). > > -> Nested Loop (cost=0.87..94657.59 rows=14005 width=72) (actual time=0.822..2038.952 rows=14199 loops=1) > > -> Index Scan using report_drugs_drug_idx on report_drugs d (cost=0.44..500.28 rows=14005 width=31) (actual time=0.669..3.900 rows=14199 loops=1) > > Index Cond: (drug = ANY ('{359,360,361,362,363}'::integer[])) > > -> Index Scan using reports_id_key on reports r (cost=0.43..6.71 rows=1 width=41) (actual time=0.143..0.143 rows=1 loops=14199) > > Index Cond: (id = d.rid) > > This is a parameterised nested loop. This means that the inner side of > the loop (reports), is parameterised by the outerside of the loop, you > can see the d.rid in the Index Cond. This means that the > reports_id_key index is looked up 14199 times. You can see from the > plan that the nested loop here took 2038.952 milliseconds to complete, > so about 0.144 ms per lookup, not too bad, right? That is indeed not bad at all, thanks for pointing this out. This nested loop is thus responsible for little over 40% of the execution time. > The alternative to a parameterised nested loop plan is that a Hash > Join plan is generated. If you imagine what would happen here, likely > the matching report_drugs records would be hashed, then the outer side > of the join would then perform a SeqScan over the entire 6 million > reports records... probably not better. You can test this yourself > with; SET enable_nestloop = 0; you might need to SET enable_mergejoin > = 0; too. I imagine it'll be slower. Doing this with the current (unaltered) indexes almost doubles the time and it seems to behave exactly as you mentioned: Sort (cost=372968.28..372969.07 rows=317 width=41) (actual time=9308.174..9308.187 rows=448 loops=1) Sort Key: r.created Sort Method: quicksort Memory: 60kB -> Hash Join (cost=130941.07..372955.11 rows=317 width=41) (actual time=8131.509..9307.983 rows=448 loops=1) Hash Cond: (r.id = d.rid) -> Seq Scan on reports r (cost=0.00..105465.00 rows=5749300 width=20) (actual time=0.237..849.564 rows=5749300 loops=1) -> Hash (cost=130937.10..130937.10 rows=317 width=29) (actual time=8130.175..8130.175 rows=448 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 25kB -> Hash Join (cost=122150.13..130937.10 rows=317 width=29) (actual time=8118.130..8130.090 rows=448 loops=1) Hash Cond: (d.rid = a.rid) -> Index Scan using report_drugs_drug_idx on report_drugs d (cost=0.44..496.92 rows=13870 width=8) (actual time=0.830..3.503 rows=14200 loops=1) Index Cond: (drug = ANY ('{359,360,361,362,363}'::integer[])) -> Hash (cost=121170.69..121170.69 rows=78320 width=21) (actual time=8116.544..8116.544 rows=76972 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 3609kB -> Bitmap Heap Scan on report_adverses a (cost=2024.68..121170.69 rows=78320 width=21) (actual time=19.303..8093.257 rows=76972 loops=1) Recheck Cond: (adverse = ANY ('{"back pain - nonspecific","nonspecific back pain","back pain"}'::text[])) Heap Blocks: exact=56959 -> Bitmap Index Scan on report_adverses_adverse_idx (cost=0.00..2005.10 rows=78320 width=0) (actual time=11.756..11.756 rows=76972 loops=1) Index Cond: (adverse = ANY ('{"back pain - nonspecific","nonspecific back pain","back pain"}'::text[])) Planning time: 11.348 ms Execution time: 9308.694 ms > Likely the only way to speed this up would be to create indexes; > > create index on reports (id, age, gender, created); > ... > create index on report_adverses (rid, adverse); > > this might speedup the 2nd join a bit. Done. This does help a bit. The query runs about ~1 second faster. The plan: Sort (cost=66065.73..66066.59 rows=344 width=41) (actual time=4313.679..4313.708 rows=448 loops=1) Sort Key: r.created Sort Method: quicksort Memory: 60kB -> Nested Loop (cost=1.44..66051.23 rows=344 width=41) (actual time=43.987..4313.435 rows=448 loops=1) Join Filter: (d.rid = a.rid) -> Nested Loop (cost=1.00..54700.19 rows=14319 width=28) (actual time=0.772..1158.338 rows=14200 loops=1) -> Index Scan using report_drugs_drug_idx on report_drugs d (cost=0.44..507.78 rows=14319 width=8) (actual time=0.579..4.327 rows=14200 loops=1) Index Cond: (drug = ANY ('{359,360,361,362,363}'::integer[])) -> Index Only Scan using reports_id_age_gender_created_idx on reports r (cost=0.56..3.77 rows=1 width=20) (actual time=0.081..0.081 rows=1 loops=14200) Index Cond: (id = d.rid) Heap Fetches: 0 -> Index Scan using report_adverses_rid_idx on report_adverses a (cost=0.44..0.78 rows=1 width=21) (actual time=0.222..0.222 rows=0 loops=14200) Index Cond: (rid = r.id) Filter: (adverse = ANY ('{"back pain - nonspecific","nonspecific back pain","back pain"}'::text[])) Rows Removed by Filter: 5 Planning time: 15.968 ms Execution time: 4313.755 ms Both the (rid, adverse) and the (id, age, gender, created) indexes are now used. Yet ... this is "only" 1 second faster, still 4 seconds to complete this query (the join(s)). Is this truly the most that I could get out of it ... or could further tweaking to PostgreSQL's configuration be of any help here? > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services Cheers, Tim -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general