Re: Statistics on array values

Marco Colli <collimarco91@xxxxxxxxx> · Sun, 2 Feb 2020 19:26:19 +0100

Thanks Tom for the clear explanation. 
Unfortunately I don't get actual improvements. I use PG 11 and I run the following commands:
ALTER TABLE subscriptions ALTER tags SET STATISTICS 1000; 
ANALYZE subscriptions;

However the bias remains pretty much the same (slightly worse after). Any idea?

On Sun, Feb 2, 2020 at 6:11 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Marco Colli <collimarco91@xxxxxxxxx> writes:

> Let's say that you have a simple query like the following on a large table

> (for a multi-tenant application):

> SELECT "subscribers".* FROM "subscribers" WHERE "subscribers"."project_id"

> = 123 AND (tags @> ARRAY['de']::varchar[]);

> If you run EXPLAIN ANALYZE you can see that stats are completely wrong.

> For example I get an expected count of 3,500 rows whereas the actual

> result is 20 rows. This also results in bad query plans...

> In a previous discussion someone said that this wrong estimate is because

> @> uses a fixed selectivity of 0.001, **regardless of actual data**!!

> Is that true?

Hasn't been true since 9.2.

You might get some insight from looking into the most_common_elems,

most_common_elem_freqs, and elem_count_histogram fields of the pg_stats

view.

It seems likely to me that increasing the statistics target for this array

column would help.  IIRC, estimates for values that don't show up in

most_common_elems are going to depend on the lowest frequency that *does*

show up there ... so if you want better resolution for non-common values,

you need more entries.

                        regards, tom lane