Search Postgresql Archives

Re: Bad planning data resulting in OOM killing of postgres

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yep, 420ish million records out of 540 million records have a titleid
of 1.   There are about 880,000 other unique values, but most of the
records are 1.   Of course, n_distinct is only 292.   I'm surprised
it's not eliminating the duplicates while it builds that hash table.

This is what I'm doing for a work around right now.  Getting
n_distinct right seems to be preventing the system from breaking.
It's going to be executed once a week during the weekly maintenance.
It's setting the n_distinct of each column to the number of rows in
the associated table.

    CREATE OR REPLACE FUNCTION patch_ndistinct(_table varchar, _column
varchar, _string_table varchar)
      RETURNS real AS
    $$
         DECLARE _cnt REAL;
    BEGIN
         SELECT reltuples INTO _cnt from pg_class where relname = _string_table;
       EXECUTE 'ALTER TABLE ' || _table || ' ALTER COLUMN ' || _column
|| ' SET (n_distinct=' || _cnt || ')';
       RETURN _cnt;
    END
    $$ LANGUAGE plpgsql;
    select patch_ndistinct('log_raw', 'titleid', 'titles');
    select patch_ndistinct('log_raw', 'urlid', 'urls');
    select patch_ndistinct('log_raw', 'hostid', 'hosts');
    ANALYZE log_raw;

On Thu, Feb 16, 2017 at 10:54 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> David Hinkle <hinkle@xxxxxxxxxxxxxx> writes:
>> Tom, there are three columns in this table that exhibit the problem,
>> here is the statistics data after an analyze, and the real data to
>> compare it to.
>
>>  attname | n_distinct |  most_common_freqs
>
>>  titleid |        292 | {0.767167}
>
> Ouch.  That's saying there's some single value of titleid that accounts
> for more than three-quarters of the entries ... does that square with
> reality?  That'd certainly explain why a hash join goes nuts.
>
>                         regards, tom lane



-- 
David Hinkle

Senior Software Developer

Phone:  800.243.3729x3000

Email:  hinkle@xxxxxxxxxxxxxx

Hours:  Mon-Fri   8:00AM-5:00PM (CT)


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux