Re: Bad query plan inside EXISTS clause

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yeb Havinga a écrit :
Yeb Havinga wrote:
Kenneth Marshall wrote:
EXISTS matches NULLs too and since they are not indexed a
sequential scan is needed to check for them. Try using
IN instead.
This is nonsense in more than one way.
Hit ctrl-return a bit too slow - exists does not match null but a set of records, that is either empty or not empty. Also it is possible to index table columns with nulls, and then the indexes can still be used. Besides filtering record sets with expressions, indexes are also used for ordering. There the effect of indexes with nulls can be seen: where to put them: in front or after the non nulls? So indexes can be perfectly used in conjunction with nulls. I found the original mail rather intriguing and played with an example myself a bit, but could not repeat the behavior (9.0 devel version), in my case the exists used an index. Maybe it has something to do with the fact that the planner estimates to return 50000 rows, even when the actual numbers list only 1 hit. In the exists case, it can stop at the first hit. In the select all rows case, it must return all rows. Maybe a better plan emerges with better statistics?

Thanks for your quick investigation.

Changing the target statistics to 1000 for the column and analyzing the table has not changed the query plan.

I just notice that using a "LIMIT 1" also gives a bad query plan

EXPLAIN ANALYZE SELECT 1 FROM read_acls_cache WHERE users_md5 = '9bc9012eb29c0bb2ae3cc7b5e78c2acf' LIMIT 1; QUERY PLAN
---------------------------------------------------------------------------------------
Limit (cost=0.00..1.19 rows=1 width=0) (actual time=366.771..366.772 rows=1 loops=1) -> Seq Scan on read_acls_cache (cost=0.00..62637.01 rows=52517 width=0) (actual time=366.769..366.769 rows=1 loops=1) Filter: ((users_md5)::text = '9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
 Total runtime: 366.806 ms
(4 rows)

Perhaps the EXISTS clause is also trying to limit the sub select query.

There are no NULL value on this column but only 49 distinct values for 2.5m rows, around 51k rows per value.

What I am trying to do is to know if a value is present or not in the table, so far the fastest way is to use a COUNT:

EXPLAIN ANALYZE SELECT COUNT(1) FROM read_acls_cache WHERE users_md5 = '9bc9012eb29c0bb2ae3cc7b5e78c2acf';
                                     QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=35154.28..35154.29 rows=1 width=0) (actual time=20.242..20.242 rows=1 loops=1) -> Bitmap Heap Scan on read_acls_cache (cost=2176.10..35022.98 rows=52517 width=0) (actual time=6.937..15.025 rows=51446 loops=1) Recheck Cond: ((users_md5)::text = '9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text) -> Bitmap Index Scan on read_acls_cache_users_md5_idx (cost=0.00..2162.97 rows=52517 width=0) (actual time=6.835..6.835 rows=51446 loops=1) Index Cond: ((users_md5)::text = '9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
 Total runtime: 20.295 ms
(6 rows)

ben

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux