On 08/12/2015 04:34 AM, robbyc wrote: > Hi, > > I am new to optimizing queries and i'm getting a slow running time > (~1.5secs) with the following SQL: Before mucking about with work_mem and indexes, the first thing to do is rewrite this query correctly. Here are just some of the things wrong with the query as written: * You're doing a DISTINCT on the same set of columns also in a GROUP BY. This is redundant and causes needless deduplication. * You're joining two GROUPed BY then DISTINCTed queries using the UNION operator which will do yet another pass for deduplication. * You've got the entire query repeated for just a simple difference in the global WHERE clause. These can be merged. * You've kept LEFT JOINs in the subquery but you don't use any values from them. These can be safely removed altogether. * You're using a NOT IN clause which is almost never what you want. Use NOT EXISTS instead. What is this list() function? How is it defined? Can it be replaced with string_agg()? You're not doing yourself any favors at all with all this quoting and mixed case stuff. Here is a rewritten version, please let me know how it performs: SELECT "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated", "Vacancy"."CustomAccess", "Department"."Name" as "Department", list("Occupation"."Name") as "Occupation", "Vacancy"."PositionNo", "Vacancy"."Template" FROM "Vacancy" JOIN "CategoryOption_TableRow" as "c_50" ON ( "c_50"."Category_TableID"= 50 AND "c_50"."RowID" = "Vacancy"."ID" AND "c_50"."CategoryOptionID"=19205) LEFT JOIN "CategoryOption_TableRow" as "c_22" ON ( "c_22"."RowID" = "Vacancy"."ID" AND "c_22"."Category_TableID" = 22) LEFT JOIN "CategoryOption" as "Occupation" ON ( "Occupation"."ID" = "c_22"."CategoryOptionID") LEFT JOIN "TableRow_TableRow" as "t_33" ON ( "t_33"."Table1RowID" = "Vacancy"."ID" AND "t_33"."Table_TableID" = 33) LEFT JOIN "Department" ON ( "Department"."ID" = "t_33"."Table2RowID" AND "Department"."Active" = 't' AND "Department"."ClientID" = 263) WHERE "Vacancy"."ClientID" = 263 AND NOT EXISTS ( SELECT 1 FROM "Vacancy" as _Vacancy JOIN "CategoryOption_TableRow" "ct126" on ( "ct126"."Category_TableID" = 126 AND "RowID" = _Vacancy."ID") WHERE _Vacancy."Template" AND "ct126"."CategoryOptionID" IN (34024,35254,35255,35256) AND _Vacancy."ID" = "Vacancy"."ID") AND ("Vacancy"."Template" = 't' OR "Vacancy"."Level" = 1) GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated", "Vacancy"."CustomAccess", "Department"."Name", "Vacancy"."PositionNo", "Vacancy"."Template" > SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle", > "Vacancy"."DateCreated", "Vacancy"."CustomAccess" > , "Department"."Name" as "Department", list("Occupation"."Name") as > "Occupation", "Vacancy"."PositionNo" > , "Vacancy"."Template" from > "Vacancy" > LEFT JOIN "CategoryOption_TableRow" as "c_22" > ON ("c_22"."RowID" = "Vacancy"."ID" > and "c_22"."Category_TableID" = 22) > LEFT JOIN "CategoryOption" as "Occupation" > ON ("Occupation"."ID" = "c_22"."CategoryOptionID") > LEFT JOIN "TableRow_TableRow" as "t_33" > ON ("t_33"."Table1RowID" = "Vacancy"."ID" > and "t_33"."Table_TableID" = 33 ) > LEFT JOIN "Department" > ON ("Department"."ID" = "t_33"."Table2RowID" and > "Department"."Active" = 't' and "Department" > ."ClientID" = 263) > JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50 > and "c_50"."RowID" = "Vacancy" > ."ID" and "c_50"."CategoryOptionID"=19205) > WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT > DISTINCT("Vacancy"."ID") > FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on > ("ct126"."Category_TableID" = 126 > and "RowID" = "Vacancy"."ID") > left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID" > and "Workflow"."Level" > = 1) > left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" = > "Workflow"."ID" and "c30"."Category_TableID" > = 30 and "c30"."CategoryOptionID" = 21923) > WHERE "Template" AND "ct126"."CategoryOptionID" > IN(34024,35254,35255,35256)) and "Vacancy" > ."Template" = 't' > GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated", > "Vacancy"."CustomAccess", "Department" > ."Name", "Vacancy"."PositionNo", "Vacancy"."Template" > UNION > SELECT distinct("Vacancy"."ID"), "Vacancy"."JobTitle", > "Vacancy"."DateCreated", "Vacancy"."CustomAccess" > , "Department"."Name" as "Department", list("Occupation"."Name") as > "Occupation", "Vacancy"."PositionNo" > , "Vacancy"."Template" from > "Vacancy" > LEFT JOIN "CategoryOption_TableRow" as "c_22" > ON ("c_22"."RowID" = "Vacancy"."ID" > and "c_22"."Category_TableID" = 22) > LEFT JOIN "CategoryOption" as "Occupation" > ON ("Occupation"."ID" = "c_22"."CategoryOptionID") > LEFT JOIN "TableRow_TableRow" as "t_33" > ON ("t_33"."Table1RowID" = "Vacancy"."ID" > and "t_33"."Table_TableID" = 33 ) > LEFT JOIN "Department" > ON ("Department"."ID" = "t_33"."Table2RowID" and > "Department"."Active" = 't' and "Department" > ."ClientID" = 263) > JOIN "CategoryOption_TableRow" as "c_50" ON ("c_50"."Category_TableID"= 50 > and "c_50"."RowID" = "Vacancy" > ."ID" and "c_50"."CategoryOptionID"=19205) > WHERE "Vacancy"."ClientID" = 263 AND "Vacancy"."ID" NOT IN(SELECT > DISTINCT("Vacancy"."ID") > FROM "Vacancy" join "CategoryOption_TableRow" "ct126" on > ("ct126"."Category_TableID" = 126 > and "RowID" = "Vacancy"."ID") > left join "Workflow" on ("Workflow"."VacancyID" = "Vacancy"."ID" > and "Workflow"."Level" > = 1) > left join "CategoryOption_TableRow" "c30" on ("c30"."RowID" = > "Workflow"."ID" and "c30"."Category_TableID" > = 30 and "c30"."CategoryOptionID" = 21923) > WHERE "Template" AND "ct126"."CategoryOptionID" > IN(34024,35254,35255,35256)) and "Vacancy" > ."Template" <> 't' AND "Vacancy"."Level" = 1 > GROUP BY "Vacancy"."ID", "Vacancy"."JobTitle", "Vacancy"."DateCreated", > "Vacancy"."CustomAccess", "Department" > ."Name", "Vacancy"."PositionNo", "Vacancy"."Template" > ORDER BY "JobTitle" > > Running explain analyze gives me the following information: > http://explain.depesz.com/s/pdC <http://explain.depesz.com/s/pdC> > > For a total runtime: 2877.157 ms > > If i remove the left joins on Department and TableRow_TableRow this reduces > the run time by about a third. > Additionally removing CategoryOption and CategoryOption_TableRow joins > further reduces by a about a third. > > Given that i need both these joins for the information retrieved by them, > what would be the best way to re-factor this query so it runs faster? > > Looking at the output of explain analyze the hash aggregates and sort seem > to be the primary issue. > > Thanks in advance -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance