On Tue, Oct 11, 2011 at 8:05 PM, Dave Crooke <dcrooke@xxxxxxxxx> wrote: > Hi James > > > I'm guessing the problem is that the combination of using a view and the way > the view is defined with an in-line temporary table is too complex for the > planner to introspect into, transform and figure out the equivalent direct > query, and so it's creating that entire temporary table every time you > evaluate the select. > > Our app has some similar queries (get me the most recent row from a data > logging table) and these work fine with a simple self-join, like this > example (irrelevant columns omitted for discussion) > > select t.entity, t.time_stamp, t.data from log_table t > where t.entity=cast('21EC2020-3AEA-1069-A2DD-08002B30309D' as uuid) > and t.time_stamp= > (select max(time_stamp) > from log_table u > where t.entity=u.entity) > > given a schema with the obvious indexes ... > > create table log_table > (entity UUID, > time_stamp TIMESTAMP WITHOUT TIME ZONE, > data TEXT); > > create index log_table_index on log_table (entity, time_stamp); > > .. and the plan for the dependent sub-query does the obvious reverse index > scan as you'd expect / want. > > > > If you still want / need to have the view, I suspect that getting rid of the > temp table definition will fix it ... my effort is below, alternatively you > might be able to take your first example and pull out best_scores and define > it as a view alos, > > CREATE VIEW best_in_school_method3 AS > SELECT competition_name, academic_year_beginning, centre_number, entry_id, > total_score, (true) AS best_in_school FROM challenge_entries ce1 > WHERE total_score = > (SELECT MAX(total_score) FROM challenge_entries ce2 > WHERE ce1.competition_name=ce2.competition_name > AND ce1.academic_year_beginning=ce2.academic_year_beginning > AND ce1.centre_number=ce2.centre_number > ) This is a very common problem in SQL and has a lot of interesting solutions. In queries like this I usually use the 'ORDER BY total_score DESC LIMIT 1 trick. Modern postgres is *usually* smart enough to convert max to that, but not always. WHERE total_score = (SELECT total_score FROM challenge_entries ce2 WHERE ce1.competition_name=ce2.competition_name AND ce1.academic_year_beginning=ce2.academic_year_beginning AND ce1.centre_number=ce2.centre_number ORDER BY total_score DESC LIMIT 1 ) Another clever, and more portable way to write it which can sometimes give a better plan is like this: WHERE NOT EXISTS ( SELECT 1 FROM challenge_entries ce2 WHERE ce1.competition_name=ce2.competition_name AND ce1.academic_year_beginning=ce2.academic_year_beginning AND ce1.centre_number=ce2.centre_number AND ce1.total_score > ce2.total_score ) Yet another interesting way of getting the 'top' record based on a certain criteria is to write a custom aggregate. In postgres you can aggregate over the entire record type, not just plain fields, so that running your aggregate function looks like this: SELECT competition_name, academic_year_beginning, centre_number, max_challenge_entries(ce) FROM challenge_entries ce GROUP BY 1,2,3; Your function aggregator in SQL would look something like this: CREATE OR REPLACE FUNCTION max_challenge_entries_impl(challenge_entries, challenge_entries) returns challenge_entries AS $$ SELECT CASE WHEN ($2).total_score > ($1).total_score THEN $2 ELSE $1 END; $$ LANGUAGE SQL IMMUTABLE; This very STLish approach is rarely the best way to go performance wise although it can give better worst case plans in some cases (although total_score if in index can never be used for optimization). I mention it because I find it to be very clean conceptually and can be a great approach if your 'picking' algorithm is sufficiently more complex than 'field > field' and is also otherwise not optimizable. Anyways, food for thought. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance