Re: Help needed on optimizing query

Tom Lane <tgl@xxxxxxxxxxxxx> · Mon, 13 Apr 2020 11:33:10 -0400

Mark Steben <mark.steben@xxxxxxxxxxxxxxxxx> writes:
>          ->  *Materialize  (cost=0.57..654.71 rows=5190 width=8) (actual
> time=0.738..269.930 rows=5797068 loops=1)*
>                ->  Nested Loop Left Join  (cost=0.57..641.73 rows=5190
> width=8) (actual time=0.736..6.183 rows=5190 loops=1)

> My specific question to you is: I notice in the boldened lines the
> discrepancy between the 'estimated row count' and 'actual row count' Most
> blatant is one Materialize where estimated rowcount as calculated in the
> explain is 5190 and actual rowcount is 5,797.068.  How do I fix this so
> that the estimated is closer to the actual.

I don't think there's any estimation failure here at all: notice that the
input left-join node's estimate is dead on.  The reason the Materialize's
output row count is higher is that the same rows are being read from it
over and over, which is something that a Merge Join will do to its
right-hand input when the left-hand input has a lot of duplicate join
keys.  The reason the planner stuck a Materialize here is exactly to
make that as cheap as it could.

It looks to me like the really bad aspect of this plan is that
the most aggressive filter can't be applied till the very end:

         Filter: ((roles_users.id IS NOT NULL) OR (access_keys_users.id IS NOT NULL) OR (accounts_users.id IS NOT NULL))
         Rows Removed by Filter: 5613165

You're likely not going to be able to do much about that without
a significant rethinking of your table layout.  But having to do
a seven-way join in a performance-critical query is already a sign
that you're in for pain.

			regards, tom lane