On 2010-08-25, Joshua Berry wrote: > Hi Group, > > I've never really learned how to optimize queries that join > several tables and have order by clauses that specify columns > from each table. Is there documentation that could help me > optimize and have the proper indexes in place? I've read > through the PG Docs Chapter 11 on Indexes yet still lack the > needed understanding. > > Here's my latest culprit: > > select Anl.Priority, Anl.Lab, Anl.Job, JOB.DateIn, > JOB.CompanyCode, Anl.SampleName > from analysis anl join job on anl.job = job.job > order by job.companycode, anl.job, anl.lab > limit 10; > Could you try to remove the limit clause? I have seen several times, that it may slow down a query. Although I haven't tested it, that an OFFSET 0 clause can improve the situation, iirc. >From an algebraic point of view, I cannot see obvious inefficiencies. Others, which now the internals of pg better, might see more. > Here's the query plan using PG 8.4.4: > Limit (cost=21990.24..21990.27 rows=10 width=32) > -> Sort (cost=21990.24..22437.69 rows=178979 width=32) > Sort Key: job.companycode, anl.lab > -> Hash Join (cost=451.20..18122.57 rows=178979 width=32) > Hash Cond: (anl.job = job.job) -> Seq Scan on analysis > anl (cost=0.00..14091.79 rows=178979 width=23) -> Hash > (cost=287.20..287.20 rows=13120 width=17) > -> Seq Scan on job (cost=0.00..287.20 > rows=13120 width=17) > > > If I change the above query to only order by one of the > tables, I get better results: > select Anl.Priority, Anl.Lab, Anl.Job, JOB.DateIn, > JOB.CompanyCode, Anl.SampleName > from analysis anl join job on anl.job = job.job > order by job.companycode --, anl.job, anl.lab > limit 10; > Limit (cost=0.00..3.65 rows=10 width=32) > -> Nested Loop (cost=0.00..65269.13 rows=178979 width=32) > -> Index Scan using job_companycode on job (cost=0.00..972.67 > rows=13120 width=17) -> Index Scan using analysis_job_lab on > analysis anl > (cost=0.00..4.63 rows=22 width=23) > Index Cond: (anl.job = job.job) > That are estimated query plans, what does EXPLAIN ANALYZE say? The query plans above do not execute the query instead they just make a rough guess about the costs. Reality might be different. Also you may want to run VACUUM ANALYZE before. > Any idea on how I can improve this? In the past I would tend > to create a cached copy of the query as a table that would be > utilized, but I suspect that there is a better way to go > about this. I'm using a system (Clarion) which heavily uses > cursors via the ODBC driver (I use the psqlODBC latest > version) to get a handful of records at a time, so no actual > LIMITs would be used in the production queries; I've added > the LIMITs here to try to simulate the performance > differences that I find when browsing the data while ordering > by the above columns. > > > Here are the relevant tables and indexes: > > > CREATE TABLE job > ( > job bigint NOT NULL, -- Job # > companycode character(4), -- Company Code > recdbycode character(3), -- Initials of who checked in sample(s) > datein date, -- Date sample was received > project character varying, -- Project or Site name > remarks text, -- Remarks > --[CONSTRAINTs etc] > ) > > CREATE INDEX job_companycode > ON job > USING btree > (companycode); > CREATE INDEX job_companycode_job > ON samples.job > USING btree > (companycode, job); > Index job_companycode is not used in the plans. Additionally, it can be constructed from the second index, as companycode is the primary sort key. > CREATE TABLE analysis > ( > lab bigint NOT NULL, -- Lab number > job bigint, -- Job number > sampletype character varying(5), -- General class of sample > priority character(1), -- Priority level > samplename character varying, -- Sample or Well name > CONSTRAINT rel_joblabk_to_jobk FOREIGN KEY (job) > REFERENCES job (job) MATCH SIMPLE > ON UPDATE CASCADE ON DELETE RESTRICT, > --[CONSTRAINTs etc] > ) > > CREATE INDEX analysis_companycode_job_lab > ON analysis > USING btree > (companycode, job, lab); > > CREATE INDEX analysis_job_lab > ON analysis > USING btree > (job, lab); > Maybe, the planner decides for a Sort Join, if there are sorted indexes for anl.job and job.job. But the speed-up may vary depending on the data. > Thanks for any insights and tips you can provide! > > Kind Regards, > -Joshua Berry > > > HTH -- Robert... -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general