On 2015-05-29 10:55:44 +0200, Peter J. Holzer wrote: > wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register > from term t where facttablename='facttable_stat_fta4' and columnname='einheit' and exists (select 1 from facttable_stat_fta4 f where f.einheit=t.term ); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Nested Loop Semi Join (cost=0.00..384860.48 rows=1 width=81) (actual time=0.061..0.119 rows=2 loops=1) > -> Index Scan using term_facttablename_columnname_idx on term t (cost=0.00..391.46 rows=636 width=81) (actual time=0.028..0.030 rows=3 loops=1) > Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'einheit'::text)) > -> Index Scan using facttable_stat_fta4_einheit_idx on facttable_stat_fta4 f (cost=0.00..384457.80 rows=21788970 width=3) (actual time=0.027..0.027 rows=1 loops=3) > Index Cond: ((einheit)::text = (t.term)::text) > Total runtime: 0.173 ms > (6 rows) > > 0.17 ms. Much faster than a plain select distinct over a table with 43 > million rows could ever hope to be. > > warenstrom is very similar and the columns with more distinct values > aren't that bad either. > > But for column berechnungsart the result is bad: > > wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register > from term t where facttablename='facttable_stat_fta4' and columnname='berechnungsart' and exists (select 1 from facttable_stat_fta4 f where f.berechnungsart=t.term ); > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Merge Semi Join (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1) > Merge Cond: ((t.term)::text = (f.berechnungsart)::text) > -> Index Scan using term_term_idx on term t (cost=0.00..319880.73 rows=636 width=81) (actual time=7703.809..7703.938 rows=3 loops=1) > Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text = 'berechnungsart'::text)) > -> Index Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f (cost=0.00..2545748.85 rows=43577940 width=2) (actual time=0.089..16263.582 rows=21336180 loops=1) > Total runtime: 30948.648 ms > (6 rows) > > Over 30 seconds! That's almost 200'000 times slower. First I'd like to apologize for dropping out of the thread without providing a test data set. I actually had one prepared (without confidential data), but I wanted to make sure that I could reproduce the problem with the test data, and I didn't get around to it for a week or two and then I went on vacation ... Anyway, in the meantime you released 9.5alpha (thanks for that, I probably would have compiled a snapshot sooner or later, but installing debian packages is just a lot more convenient - I hope you get a lot of useful feedback) and I installed that this weekend. I am happy to report that the problem appears to be solved. All the queries of this type I threw at the database finish in a few milliseconds now. hp -- _ | Peter J. Holzer | I want to forget all about both belts and |_|_) | | suspenders; instead, I want to buy pants | | | hjp@xxxxxx | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/
Attachment:
signature.asc
Description: Digital signature