Re: Bad plan on a huge table query

Jeff Janes <jeff.janes@xxxxxxxxx> · Sun, 31 Mar 2013 12:29:26 -0700

On Tue, Mar 26, 2013 at 5:08 AM, Daniel Cristian Cruz <danielcristian@xxxxxxxxx> wrote:

Well, I did it:

explain (analyze, buffers) 
select count(*) from turma.aula_confirmacao where 
inicio_aula::DATE BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP; -- changed name because of a conflict in some queries

http://explain.depesz.com/s/Fzr

That looks reasonable now.  But what changed?

And just to update, this is the actual query and plan:

...

http://explain.depesz.com/s/YfXr

The part of this complex query that corresponds to that simple query now looks reasonable, while before it did not.  What changed in the mean time? Whatever it is that changed (a good vacuum analyze?), you probably need to do the same thing on  idx_presenca_3 .

But I see that the difference is not that the estimate moved, but rather that the number of actual rows moved, to be closer to the estimate.  I don't know what to make of that.

I guess that, there is something with estudante_periodo, because there is 24% with only one row and 50% with 5 or less rows on it:

That could be, but I don't really know how to tackle that.  So I'm starting at the bottom level queries that are simple (i.e. just an index scan, no complex joins or filters) and yet are poorly estimated, because those ones are tractable.  Maybe the horrible estimates at the top are just the cumulation of bad estimates at the bottom (but maybe they are not--but still it seems to make sense the tackle the easy ones first)  

> After the refactoring, idx_aula_confirmacao_2 became idx_aula_confirmacao_1:

select * from pg_stats where tablename = 'idx_aula_confirmacao_1';

Alas, this information is not so interesting anymore, because now the query is getting the estimate at this step pretty close.  It is back when the estimate was so horrible that this info would be interesting.

-[ RECORD 1 ]----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

schemaname             | turma
tablename              | idx_aula_confirmacao_1
attname                | inicio_aula
inherited              | f
null_frac              | 0.996792

But, with the null_frac so high, a partial index for "where ... not null" might be a good idea.  I doubt it would make the query change the execution plan, but it would at least take up less disk space and so be more cacheable.

Cheers,

Jeff