2017-04-05 10:13 GMT+02:00 Daniel Westermann <daniel.westermann@xxxxxxxxxxxxxxxx>:
2017-04-05 9:28 GMT+02:00 Daniel Westermann <daniel.westermann@dbi-services.com >:>>what is result of EXPLAIN statement for slow and fast cases?>>>>regards>>>>PavelFor work_mem=32MBexplain (analyze,verbose,buffers) select count(user_id) from users where user_id not in ( select id from ids);
QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ --------------
Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual time=7945.592..7945.593 rows=1 loops=1)
Output: count(users.user_id)
Buffers: shared read=29425
-> Seq Scan on public.users (cost=16925.01..79425.01 rows=1500000 width=4) (actual time=1928.665..5888.645 rows=2499999 loops=1)
Output: users.user_id, users.username
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 500001
Buffers: shared read=29425
SubPlan 1
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=1000001 width=4) (actual time=11.111..868.382 rows=1000001 loops=1)
Output: ids.id
Buffers: shared read=4425
Planning time: 187.396 ms
Execution time: 7948.108 ms
(14 rows)
Time: 8244.493 msFor work_mem='16MB' it does not complete with analyze in on hour. For explain only:explain (verbose) select count(user_id) from users where user_id not in ( select id from ids);
QUERY PLAN
------------------------------------------------------------ ----------------------------
Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8)
Output: count(users.user_id)
-> Seq Scan on public.users (cost=0.00..38748088750.00 rows=1500000 width=4)
Output: users.user_id, users.username
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..23332.01 rows=1000001 width=4)
Output: ids.id
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=1000001 width=4)
Output: ids.id
(10 rows)Yes:\d ids
Table "public.ids"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
Indexes:
"i_ids" UNIQUE, btree (id)
hmm .. NOT IN is just bad :(
The second is slow becase table ids is stored in temp file. and it is repeatedly read from file. In first case, ids table is stored in memory.
SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from ids where id = user_id)
Regards
Pavel