no MCV list of tiny table with unique columns

Justin Pryzby <pryzby@xxxxxxxxxxxxx> · Wed, 2 Nov 2016 13:53:18 -0500

We have a report query which joins (multiple times, actually) against this
trivial, tiny table:

ts=# \d bsm_to_switch
Table "public.bsm_to_switch"
 Column | Type | Modifiers 
--------+------+-----------
 bsm    | text | not null
 switch | text | not null

ts=# SELECT length(bsm), length(switch) FROM bsm_to_switch;
 length | length 
--------+--------
     10 |      6
     10 |      6
(2 rows)

The column values are distinct.

I believe the join is being (badly) underestimated, leading to a crappy plan
involving multiple nested loop joins, which takes 2.5 hours instead of a
handful of seconds; I believe that might be resolved by populating its MCV
list..

..however, on reading commands/analyze.c, the issue is these columns have no
duplicates, and also postgres decides that "since the number of distinct rows
is greater than 10% of the total number of rows", that ndistinct should be -1
(meaning it scales with the table size).  That's fine, except that it then
effectively precludes populating the MCV list.

|	if (nmultiple == 0)
|	{
|		/*
|		 * If we found no repeated non-null values, assume it's a unique
|		 * column; but be sure to discount for any nulls we found.
|		 */
|		stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
|	}
|	else if (track_cnt < track_max && toowide_cnt == 0 &&
|			 nmultiple == track_cnt)
|	{
|		/*
|		 * Our track list includes every value in the sample, and every
|		 * value appeared more than once.  Assume the column has just
|		 * these values.  (This case is meant to address columns with
|		 * small, fixed sets of possible values, such as boolean or enum
|		 * columns.  If there are any values that appear just once in the
|		 * sample, including too-wide values, we should assume that that's
|		 * not what we're dealing with.)
|		 */
|		stats->stadistinct = track_cnt;
|	}

ts=# SELECT attname, inherited, null_frac, avg_width, n_distinct, most_common_vals FROM pg_stats WHERE tablename='bsm_to_switch';
 attname | inherited | null_frac | avg_width | n_distinct | most_common_vals 
---------+-----------+-----------+-----------+------------+------------------
 bsm     | f         |         0 |        11 |         -1 | 
 switch  | f         |         0 |         7 |         -1 | 
(2 rows)

Any ideas?  I tried setting n_distinct=2, but that seems to not have any effect
within ANALYZE itself.

ts=# SELECT attname, inherited, null_frac, avg_width, n_distinct, most_common_vals FROM pg_stats WHERE tablename='bsm_to_switch';
 attname | inherited | null_frac | avg_width | n_distinct | most_common_vals 
---------+-----------+-----------+-----------+------------+------------------
 bsm     | f         |         0 |        11 |          2 | 
 switch  | f         |         0 |         7 |          2 | 
(2 rows)

Thanks in advance.

Justin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance