Yep, 420ish million records out of 540 million records have a titleid of 1. There are about 880,000 other unique values, but most of the records are 1. Of course, n_distinct is only 292. I'm surprised it's not eliminating the duplicates while it builds that hash table. This is what I'm doing for a work around right now. Getting n_distinct right seems to be preventing the system from breaking. It's going to be executed once a week during the weekly maintenance. It's setting the n_distinct of each column to the number of rows in the associated table. CREATE OR REPLACE FUNCTION patch_ndistinct(_table varchar, _column varchar, _string_table varchar) RETURNS real AS $$ DECLARE _cnt REAL; BEGIN SELECT reltuples INTO _cnt from pg_class where relname = _string_table; EXECUTE 'ALTER TABLE ' || _table || ' ALTER COLUMN ' || _column || ' SET (n_distinct=' || _cnt || ')'; RETURN _cnt; END $$ LANGUAGE plpgsql; select patch_ndistinct('log_raw', 'titleid', 'titles'); select patch_ndistinct('log_raw', 'urlid', 'urls'); select patch_ndistinct('log_raw', 'hostid', 'hosts'); ANALYZE log_raw; On Thu, Feb 16, 2017 at 10:54 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > David Hinkle <hinkle@xxxxxxxxxxxxxx> writes: >> Tom, there are three columns in this table that exhibit the problem, >> here is the statistics data after an analyze, and the real data to >> compare it to. > >> attname | n_distinct | most_common_freqs > >> titleid | 292 | {0.767167} > > Ouch. That's saying there's some single value of titleid that accounts > for more than three-quarters of the entries ... does that square with > reality? That'd certainly explain why a hash join goes nuts. > > regards, tom lane -- David Hinkle Senior Software Developer Phone: 800.243.3729x3000 Email: hinkle@xxxxxxxxxxxxxx Hours: Mon-Fri 8:00AM-5:00PM (CT) -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general