I manually updated the pg_statistics data by literally set it to an appropriate amount, and the planner picked a new plan and the new plan worked. Any idea what I should do about this? Is manually updating these values my best bet? psql:daemon@cipafilter = update pg_statistic set stadistinct = 850000 where starelid = 16678; psql:daemon@cipafilter = EXPLAIN (ANALYZE, BUFFERS) delete from titles WHERE NOT EXISTS ( SELECT 1 FROM log_raw WHERE log_raw.titleid = titles.titleid ); QUERY PLAN ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Delete on titles (cost=111982485.81..117653772.60 rows=26291 width=12) (actual time=726777.769..726777.769 rows=0 loops=1) Buffers: shared hit=769918 read=7323762 dirtied=1, temp read=1944017 written=2177366 -> Merge Anti Join (cost=111982485.81..117653772.60 rows=26291 width=12) (actual time=726777.766..726777.766 rows=0 loops=1) Merge Cond: (titles.titleid = log_raw.titleid) Buffers: shared hit=769918 read=7323762 dirtied=1, temp read=1944017 written=2177366 -> Index Scan using titles_pkey on titles (cost=0.42..44451.84 rows=876291 width=10) (actual time=0.006..316.375 rows=874829 loops=1) Buffers: shared hit=769000 dirtied=1 -> Materialize (cost=109899462.67..112609518.99 rows=542011264 width=10) (actual time=545806.339..687770.786 rows=425297005 loops=1) Buffers: shared hit=918 read=7323762, temp read=1944017 written=2177366 -> Sort (cost=109899462.67..111254490.83 rows=542011264 width=10) (actual time=545806.336..652402.716 rows=425297005 loops=1) Sort Key: log_raw.titleid Sort Method: external merge Disk: 11013224kB Buffers: shared hit=918 read=7323762, temp read=1944017 written=2177366 -> Seq Scan on log_raw (cost=0.00..12744792.64 rows=542011264 width=10) (actual time=0.003..274920.970 rows=544670242 loops=1) Buffers: shared hit=918 read=7323762 Planning time: 0.158 ms Execution time: 727533.213 ms (17 rows) On Mon, Feb 13, 2017 at 3:57 PM, David Hinkle <hinkle@xxxxxxxxxxxxxx> wrote: > I managed to get this version to finish: > > psql:postgres@cipafilter = explain (ANALYZE, BUFFERS) select count(*) > from (select titleid from log_raw group by titleid) as a; > QUERY PLAN > ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── > Aggregate (cost=14099827.15..14099827.16 rows=1 width=0) (actual > time=248805.549..248805.549 rows=1 loops=1) > Buffers: shared hit=598 read=7324082 dirtied=34 > -> HashAggregate (cost=14099820.80..14099823.62 rows=282 width=4) > (actual time=248504.756..248760.382 rows=874750 loops=1) > Group Key: log_raw.titleid > Buffers: shared hit=598 read=7324082 dirtied=34 > -> Seq Scan on log_raw (cost=0.00..12744792.64 > rows=542011264 width=4) (actual time=0.002..145554.907 rows=544654818 > loops=1) > Buffers: shared hit=598 read=7324082 dirtied=34 > Planning time: 0.072 ms > Execution time: 248807.285 ms > (9 rows) > > On Mon, Feb 13, 2017 at 3:47 PM, David Hinkle <hinkle@xxxxxxxxxxxxxx> wrote: >> psql:postgres@cipafilter = EXPLAIN (ANALYZE, BUFFERS) select titleid >> from titles WHERE NOT EXISTS ( SELECT 1 FROM log_raw WHERE >> log_raw.titleid = titles.titleid ); >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> The connection to the server was lost. Attempting reset: Failed. >> >> Nope, that pops too. The query runs for a long time at a somewhat >> normal rate of ram consumption, using ~1G of RSS then suddenly spikes >> to about 6G, at which point the OOM killer pops it. Box has 8G of ram >> and 4G of swap. >> >> On Mon, Feb 13, 2017 at 3:21 PM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote: >>> On Mon, Feb 13, 2017 at 12:43 PM, David Hinkle <hinkle@xxxxxxxxxxxxxx> >>> wrote: >>>> >>>> Thanks Jeff, >>>> >>>> No triggers or foreign key constrains: >>>> >>>> psql:postgres@cipafilter = \d+ titles >>>> Table "public.titles" >>>> Column │ Type │ Modifiers >>>> │ Storage │ Stats target │ Description >>>> >>>> ─────────┼───────────────────┼──────────────────────────────────────────────────────────┼──────────┼──────────────┼───────────── >>>> title │ character varying │ >>>> │ extended │ │ >>>> titleid │ integer │ not null default >>>> nextval('titles_titleid_seq'::regclass) │ plain │ │ >>>> Indexes: >>>> "titles_pkey" PRIMARY KEY, btree (titleid) >>>> "titles_md5_title_idx" btree (md5(title::text)) >>>> >>>> Do you see anything in there that would be problematic? >>> >>> >>> >>> I'm out of ideas here. What happens if you just select the rows, rather >>> than deleting them? Does it have memory problems then? If not, can you >>> post the explain (analyze, buffers) of doing that? >>> >>> Cheers, >>> >>> Jeff >> >> >> >> -- >> David Hinkle >> >> Senior Software Developer >> >> Phone: 800.243.3729x3000 >> >> Email: hinkle@xxxxxxxxxxxxxx >> >> Hours: Mon-Fri 8:00AM-5:00PM (CT) > > > > -- > David Hinkle > > Senior Software Developer > > Phone: 800.243.3729x3000 > > Email: hinkle@xxxxxxxxxxxxxx > > Hours: Mon-Fri 8:00AM-5:00PM (CT) -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@xxxxxxxxxxxxxx Hours: Mon-Fri 8:00AM-5:00PM (CT) -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general