Le 2013-04-16 à 22:51, François Beausoleil a écrit : > Hi all! > > I track Twitter followers in my database. I have the following table: > > # \d persona_followers > Table "public.persona_followers" > Column | Type | Modifiers > -------------+-----------------------------+----------- > service_id | bigint | not null > follower_id | bigint | not null > valid_at | timestamp without time zone | > Indexes: > "persona_followers_pkey" PRIMARY KEY, btree (service_id, follower_id) > > The table IS NOT partitioned. A run with the following query: INSERT INTO "persona_followers" SELECT "service_id", "follower_id", now() FROM ( SELECT * FROM ( SELECT DISTINCT "service_id", "follower_id" FROM "persona_followers_import" WHERE "service_id" IS NOT NULL OR "follower_id" IS NOT NULL EXCEPT SELECT "service_id", "follower_id" FROM "persona_followers") AS "t1") AS "t1" results in http://explain.depesz.com/s/Y1c Insert on public.persona_followers (cost=139261.12..20483497.65 rows=6256498 width=16) (actual time=4729255.535..4729255.535 rows=0 loops=1) Buffers: shared hit=33135295 read=4776921 -> Subquery Scan on t1 (cost=139261.12..20483497.65 rows=6256498 width=16) (actual time=562265.156..578844.999 rows=6819520 loops=1) Output: t1.service_id, t1.follower_id, now() Buffers: shared hit=36891 read=3572263 -> HashSetOp Except (cost=139261.12..20389650.18 rows=6256498 width=16) (actual time=562265.127..566513.759 rows=6819520 loops=1) Output: "*SELECT* 1".service_id, "*SELECT* 1".follower_id, (0) Buffers: shared hit=36891 read=3572263 -> Append (cost=139261.12..17054024.97 rows=667125042 width=16) (actual time=4090.462..320879.545 rows=667689321 loops=1) Buffers: shared hit=36891 read=3572263 -> Subquery Scan on "*SELECT* 1" (cost=139261.12..264391.09 rows=6256498 width=16) (actual time=4090.461..7798.334 rows=6820784 loops=1) Output: "*SELECT* 1".service_id, "*SELECT* 1".follower_id, 0 Buffers: shared hit=36891 -> HashAggregate (cost=139261.12..201826.11 rows=6256498 width=16) (actual time=4090.459..6795.009 rows=6820784 loops=1) Output: persona_followers_import.service_id, persona_followers_import.follower_id Buffers: shared hit=36891 -> Seq Scan on francois.persona_followers_import (cost=0.00..105137.75 rows=6824675 width=16) (actual time=0.017..1344.916 rows=6824700 loops=1) Output: persona_followers_import.service_id, persona_followers_import.follower_id Filter: ((persona_followers_import.service_id IS NOT NULL) OR (persona_followers_import.follower_id IS NOT NULL)) Buffers: shared hit=36891 -> Subquery Scan on "*SELECT* 2" (cost=0.00..16789633.88 rows=660868544 width=16) (actual time=6.694..238761.499 rows=660868537 loops=1) Output: "*SELECT* 2".service_id, "*SELECT* 2".follower_id, 1 Buffers: shared read=3572263 -> Seq Scan on public.persona_followers (cost=0.00..10180948.44 rows=660868544 width=16) (actual time=6.693..137929.808 rows=660868537 loops=1) Output: public.persona_followers.service_id, public.persona_followers.follower_id Buffers: shared read=3572263 Total runtime: 4729338.157 ms 1h20m for 6.8 million rows inserted. The estimates are spot on, since I had just run VACUUM ANALYZE on both tables prior to doing this test. Running the original query now, SELECT * FROM ... WHERE NOT EXISTS(...). Bye, François
<<attachment: smime.p7s>>