Am 05/22/2017 um 09:57 PM schrieb Jeff Janes: > I need to make a view which decorates rows from a parent table with > aggregated values from a child table. I can think of two ways to write > it, one which aggregates the child table to make a new query table and > joins the parent to that, as shown in "view1" below. Or does subselect > in the select list to aggregate just the currently matching rows, and > returns that value, as in "view2" below. > > While these two are semantically equivalent, the planner doesn't > understand that, and always executes them pretty much the way you would > naively do it based on the text of the query. > > But view1 is pretty slow if the WHERE clause is highly selective (like > "WHERE cutoff<0.00001") because it has to summarize the entire child > table just to pull out a few rows. But view2 is pretty slow if the > entire view or most of it (like "WHERE cutoff<0.9") is being returned. > > Is there some 3rd way to write the query which allows the planner to > switch between strategies (summarize whole table vs summarize values on > demand) depending on the known selectivity of the where clause? > > In this case, the planner is getting the relative cost estimates roughly > correct. It is not a problem of mis-estimation. > > I can always create two views, view_small and view_large, and swap > between them based on my own knowledge of how restrictive a query is > likely to be, but that is rather annoying. Especially in the real-world > situation, which is quite a bit more complex than this. > > create table thing as select x as id, random() as cutoff from > generate_series(1,2000000) f(x); > > create table thing_alias as select > floor(power(random()*power(2000000,5),0.2))::int thing_id, md5(x::text), > random() as priority from generate_series(1,150000) f(x); > > create index on thing_alias (thing_id ); > > create index on thing (cutoff ); > > vacuum; analyze; > > create view view1 as select id, md5,cutoff from thing left join > ( > select distinct on (thing_id) thing_id, md5 from thing_alias > order by thing_id, priority desc > ) as foo > on (thing_id=id); > > create view view2 as select id, > ( > select md5 from thing_alias where thing_id=id > order by priority desc limit 1 > ) as md5, > cutoff from thing; > > Cheers, > > Jeff Hi Jeff, how does something like CREATE OR REPLACE VIEW public.view3 AS SELECT thing.id, foo.md5, thing.cutoff FROM thing, LATERAL ( SELECT DISTINCT ON (thing_alias.thing_id) thing_alias.thing_id, thing_alias.md5 FROM thing_alias WHERE thing_alias.thing_id = thing.id ORDER BY thing_alias.thing_id, thing_alias.priority DESC) foo work for you? At least that's always using an index scan here, as opposed to view1, which (for me) defaults to a SeqScan on thing_alias at a low cutoff. ***** Note btw. that both view1 and view2 don't return any md5 values for me, while view3 does! ***** Results (ms, median of 3 runs): cutoff< 0.1 0.9 view1: 348 1022 view2: 844 6484 view3: 842 5976 With LATERAL ( SELECT string_agg(thing_alias.md5, ','::text) AS md5 FROM thing_alias WHERE thing_alias.thing_id = thing.id GROUP BY thing_alias.thing_id) foo (which seems to make more sense ;-) I yield 483 (0.1) and 3410 (0.9) ms (and return md5-Aggregates). Cheers, -- Gunnar "Nick" Bluth DBA ELSTER Tel: +49 911/991-4665 Mobil: +49 172/8853339
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature