Re: Bad planning data resulting in OOM killing of postgres

David Hinkle <hinkle@xxxxxxxxxxxxxx> · Mon, 13 Feb 2017 13:53:01 -0600

Thanks guys, here's the information you requested:

psql:postgres@cipafilter = show work_mem;
 work_mem
──────────
 10MB
(1 row)

psql:postgres@cipafilter = select version();
                                                version
───────────────────────────────────────────────────────────────────────────────────────────────────────
 PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
(1 row)

On Mon, Feb 13, 2017 at 1:26 PM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
> On Mon, Feb 13, 2017 at 9:40 AM, David Hinkle <hinkle@xxxxxxxxxxxxxx> wrote:
>>
>> I'm having trouble with purges related to a large table.   The delete
>> query consumes ram until postgres crashes due to OOM.   I have a very
>> large table called log_raw.  There are half a dozen related tables,
>> such as 'urls' and 'titles'.   log_raw.urlid = urls.urlid and urls
>> contains the text of the various urls, for example.
>>
>> Each time I try to purge these side tables the unit OOM's.
>>
>> psql:postgres@cipafilter = explain DELETE FROM titles WHERE NOT EXISTS
>> ( SELECT 1 FROM log_raw WHERE log_raw.titleid = titles.titleid );
>>                                        QUERY PLAN
>>
>> ─────────────────────────────────────────────────────────────────────────────────────────
>>  Delete on titles  (cost=22166473.44..24850954.67 rows=870382 width=12)
>>    ->  Hash Anti Join  (cost=22166473.44..24850954.67 rows=870382
>> width=12)
>>          Hash Cond: (titles.titleid = log_raw.titleid)
>>          ->  Seq Scan on titles  (cost=0.00..17871.64 rows=870664
>> width=10)
>>          ->  Hash  (cost=12744792.64..12744792.64 rows=542011264 width=10)
>>                ->  Seq Scan on log_raw  (cost=0.00..12744792.64
>> rows=542011264 width=10)
>> (6 rows)
>>
>> psql:postgres@cipafilter = select count(*) from (select titleid from
>> log_raw group by titleid) as a;
>>  count
>> ────────
>>  872210
>> (1 row)
>>
>> cipafilter=# select n_distinct from pg_stats where tablename =
>> 'log_raw' and attname = 'titleid';
>>  n_distinct
>> ------------
>>         282
>> (1 row)
>>
>> The planning data is wildly low for each of these fields, and I wonder
>> if because of that error the planner thinks it can keep all these id's
>> in ram while it works. Analyze doesn't fix it.   Increasing the
>> statistics target improves the data in n_distinct but not
>> considerably, as increasing it 3 or 4 fold leads to it still being
>> wildly off.  ALTER TABLE set n_distinct doesn't seem to be used by the
>> planner as it doesn't change any of the plans I've generated or seem
>> to be taken into account in the row estimates. I'm out of ideas.
>> Anybody have any ideas?
>
>
>
> Your data on log_raw.titleid is probably clustered, so that any given page
> of the table all has the same value for titleid. This really messes up the
> sampling algorithm used by ANALYZE.  To overcome that, you would have to
> increase the statistics target by 3 or 4 orders of magnitude, not a factor
> of 3 or 4.
>
> However, that doesn't seem to be the actual problem.  Surprisingly enough, a
> hash anti-join doesn't automatically de-duplicate the hash table as it is
> being built.  So n_distinct correctly does not have an influence on the
> estimated RAM usage, because it doesn't influence the actual ram usage
> either.
>
> It sounds like your work_mem is set way too high.  What is it set to?  And
> what version of PostgreSQL are you using?
>
> Cheers,
>
> Jeff

-- 
David Hinkle

Senior Software Developer

Phone:  800.243.3729x3000

Email:  hinkle@xxxxxxxxxxxxxx

Hours:  Mon-Fri   8:00AM-5:00PM (CT)

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general