Yeb Havinga a écrit :
Yeb Havinga wrote:
Kenneth Marshall wrote:
EXISTS matches NULLs too and since they are not indexed a
sequential scan is needed to check for them. Try using
IN instead.
This is nonsense in more than one way.
Hit ctrl-return a bit too slow - exists does not match null but a set of
records, that is either empty or not empty. Also it is possible to index
table columns with nulls, and then the indexes can still be used.
Besides filtering record sets with expressions, indexes are also used
for ordering. There the effect of indexes with nulls can be seen: where
to put them: in front or after the non nulls? So indexes can be
perfectly used in conjunction with nulls. I found the original mail
rather intriguing and played with an example myself a bit, but could not
repeat the behavior (9.0 devel version), in my case the exists used an
index. Maybe it has something to do with the fact that the planner
estimates to return 50000 rows, even when the actual numbers list only 1
hit. In the exists case, it can stop at the first hit. In the select all
rows case, it must return all rows. Maybe a better plan emerges with
better statistics?
Thanks for your quick investigation.
Changing the target statistics to 1000 for the column and analyzing the
table has not changed the query plan.
I just notice that using a "LIMIT 1" also gives a bad query plan
EXPLAIN ANALYZE SELECT 1 FROM read_acls_cache WHERE users_md5 =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf' LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------------------------
Limit (cost=0.00..1.19 rows=1 width=0) (actual time=366.771..366.772
rows=1 loops=1)
-> Seq Scan on read_acls_cache (cost=0.00..62637.01 rows=52517
width=0) (actual time=366.769..366.769 rows=1 loops=1)
Filter: ((users_md5)::text =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
Total runtime: 366.806 ms
(4 rows)
Perhaps the EXISTS clause is also trying to limit the sub select query.
There are no NULL value on this column but only 49 distinct values for
2.5m rows, around 51k rows per value.
What I am trying to do is to know if a value is present or not in the
table, so far the fastest way is to use a COUNT:
EXPLAIN ANALYZE SELECT COUNT(1) FROM read_acls_cache WHERE users_md5 =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf';
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=35154.28..35154.29 rows=1 width=0) (actual
time=20.242..20.242 rows=1 loops=1)
-> Bitmap Heap Scan on read_acls_cache (cost=2176.10..35022.98
rows=52517 width=0) (actual time=6.937..15.025 rows=51446 loops=1)
Recheck Cond: ((users_md5)::text =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
-> Bitmap Index Scan on read_acls_cache_users_md5_idx
(cost=0.00..2162.97 rows=52517 width=0) (actual time=6.835..6.835
rows=51446 loops=1)
Index Cond: ((users_md5)::text =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
Total runtime: 20.295 ms
(6 rows)
ben
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance