Re: UPDATE using 3 medium-sized tables causes runaway hash table and fills disk

Tom Lane <tgl@xxxxxxxxxxxxx> · Mon, 29 Apr 2013 12:52:40 -0400

Rowan Collins <rowan.collins@xxxxxxxxx> writes:
> I've come upon some very strange behaviour with an UPDATE query which 
> causes Postgres to consume all the disk space on the server for no 
> apparent reason.

The short answer to that is "analyze your data".  Particularly when
you're using temp tables, for which auto-analyze cannot help you.

The reason this plan is so disastrous is that the planner chooses to use
lf.number_in_party = co.num_adults_searched as the inner join qual,
after having used ((num_children_searched = 0) AND (num_infants_searched
= 0) AND (supplier_code = 'AMA'::text)) as a scan qual.  In the absence
of any stats, neither choice is unreasonable.  Given the actual data,
however, *neither of those conditions has any selectivity whatsoever*,
and so a join that's predicted to yield 293 rows is actually going to
yield 900 million rows occupying something like 90GB.  Even then you'd
not have been seeing the data spilled to disk, except the planner also
didn't know that cob_offer_id IS NULL is likewise totally without
selectivity and thus underestimated the size of the upper hash table,
so that (with default work_mem) the upper hash join ends up switching
to batch mode and thus having to spill most of the 90GB to disk.

If you just stick in some ANALYZEs the results are way better:

 Hash Join  (cost=2777.90..9352.23 rows=63362 width=73) (actual time=38.902..183.116 rows=63362 loops=1)
   Hash Cond: ((lf.number_in_party = co.num_adults_searched) AND (lfm.central_offer_id = co.central_offer_id) AND (lfm.credential_set_id = co.credential_set_id))
   ->  Hash Join  (cost=1959.95..5358.02 rows=63362 width=37) (actual time=26.635..100.670 rows=63362 loops=1)
         Hash Cond: (lfm.low_fares_row_id = lf.low_fares_row_id)
         ->  Seq Scan on test_mappings lfm  (cost=0.00..976.62 rows=63362 width=17) (actual time=0.014..13.390 rows=63362 loops=1)
               Filter: (cob_offer_id IS NULL)
         ->  Hash  (cost=923.98..923.98 rows=56398 width=20) (actual time=26.590..26.590 rows=56398 loops=1)
               Buckets: 2048  Batches: 4  Memory Usage: 777kB
               ->  Seq Scan on test_low_fares lf  (cost=0.00..923.98 rows=56398 width=20) (actual time=0.008..11.484 rows=56398 loops=1)
   ->  Hash  (cost=413.48..413.48 rows=15970 width=36) (actual time=12.225..12.225 rows=15970 loops=1)
         Buckets: 2048  Batches: 2  Memory Usage: 566kB
         ->  Seq Scan on test_cob_offers co  (cost=0.00..413.48 rows=15970 width=36) (actual time=0.021..7.018 rows=15970 loops=1)
               Filter: ((num_children_searched = 0) AND (num_infants_searched = 0) AND (supplier_code = 'AMA'::text))
 Total runtime: 187.090 ms

but there's really no way for the planner to select the right plan
when some of the conditions in the query are selective and some are
not and it has no stats to help it tell the difference.

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general