Re: Query Performance / Planner estimate off

Justin Pryzby <pryzby@xxxxxxxxxxxxx> · Thu, 22 Oct 2020 01:37:01 -0500

On Wed, Oct 21, 2020 at 04:42:02PM +0200, Mats Olsen wrote:
> On 10/21/20 2:38 PM, Sebastian Dressler wrote:
> > > On 20. Oct 2020, at 11:37, Mats Julian Olsen <mats@xxxxxxxxxxxxxxxxx
> > > 
> > > [...]
> > > 
> > > 1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR
> > > <https://explain.depesz.com/s/NvDR>
> > > 2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK
> > > <https://explain.depesz.com/s/buKK>
> > > 3) enable_nestloop=off; enable_seqscan=off (2 min):
> > > https://explain.depesz.com/s/0WXx
> > > <https://explain.depesz.com/s/0WXx>
> > > 
> > > How can I get Postgres not to loop over 12M rows?
> > 
> > I looked at the plans and your config and there are some thoughts I'm
> > having:
> > 
> > - The row estimate is off, as you possibly noticed. This can be possibly
> > solved by raising `default_statistics_target` to e.g. 2500 (we typically
> > use that) and run ANALYZE
> I've `set default_statistics_target=2500` and ran analyze on both tables
> involved, unfortunately the plan is the same. The columns we use for joining
> here are hashes and we expect very few duplicates in the tables. Hence I
> think extended statistics (storing most common values and histogram bounds)
> aren't useful for this kind of data. Would you say the same thing?

In postgres, extended statistics means "MV stats objects", not MCV+histogram,
which are "simple statistics", like ndistinct.

Your indexes maybe aren't ideal for this query, as mentioned.
The indexes that do exist might also be inefficient, due to being unclustered,
or bloated, or due to multiple columns.

These look redundant (which doesn't matter for this the query):

Partition key: RANGE (block_number)
Indexes:
    "transactions_block_number_btree" btree (block_number DESC)
    "transactions_block_number_hash_key" UNIQUE CONSTRAINT, btree (block_number, hash)
    "transactions_block_number_time" btree (hash, block_number)

Maybe that would be an index just on "hash", which might help here.

Possibly you'd want to try to use a BRIN index on timestamp (or maybe
block_number?).

Maybe you'd want to VACUUM the table to allow index-only scan on the hash
columns ?

Maybe you'd want to check if reindexing reduces the index size ?  We don't know
if the table gets lots of UPDATE/DELETE or if any of the columns have high
logical vs physical "correlation".
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

Have you ANALYZED the partitioned parent recently ?
This isn't handled by autovacuum.

-- 
Justin