Re: Query Optimizer makes a poor choice

Tomas Vondra <tv@xxxxxxxx> · Thu, 01 Dec 2011 00:41:15 +0100

On 30.11.2011 23:22, Tyler Hains wrote:
>>> I haven't had a chance to experiment with the SET STATISTICS, but 
> that
>>> got me going on something interesting...
>>>
>>> Do these statistics look right?
>>>
>>> # SELECT attname, n_distinct, most_common_vals, histogram_bounds FROM
>>> pg_stats WHERE tablename = 'cards';
>>>
>> ...
>>> "card_set_id"   905
>>> "{5201,3203,3169,5679,5143,5204,5655,4322,5236,4513}"
>>> "{4,3080,3896,4349,4701,5179,5445,5706,6003,6361,6784}"
>>
>> This looks promising, because n_distinct is low enough that you can
>> cover almost all values with statistics.
>> raise the statistics and ANALYZE. should help.
>> (NOTE NOTE NOTE: assuming that the distribution is even)
>>
>>
>> ...
>> but one thing we see for sure is that you have not tuned your
>> PostgreSQL instance :-)
>> I would recommend pgtune, -> pgfoundry.org/projects/pgtune/
>> it covers most important stuff, *including* default_statistics_target.
>>
>>
>>
>> Filip
>>
> 
> I just tried the set statistics on our test system with essentially the 
> same end result. 

Can you describe the problem in a bit more detail? Because maybe you
just have the same problem as the OP.

Because with this (very simple) test case it works just fine.

========================================================================
create table test_tab (id int primary key, val int, txtval text);

insert into test_tab select i, mod(i, 10000), md5(i::text) from
generate_series(1,10000000) s(i);

create index test_tab_idx on test_tab (val);

analyze test_tab;
========================================================================

The table is about 730MB, the indexes are about 214MB each.

========================================================================
explain analyze select * from test_tab where val = 500 order by id;

1st execution (not cached): http://explain.depesz.com/s/1VQ (7786 ms)
2nd execution (cached):     http://explain.depesz.com/s/cnt (1 ms)

explain analyze select * from test_tab where val = 500 order by id limit 1;

1st execution (not cached): http://explain.depesz.com/s/nlE (66 ms)
2nd execution (cached):     http://explain.depesz.com/s/WNa (0.08 ms)
========================================================================

So in both cases the LIMIT (with index scan) is faster. Sure, there may
be cases when this does not work that well - maybe it's not well cached,
maybe there's some other issue.

But it clearly is not true that LIMIT is evil and should be avoided.

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general