Re: Query Performance / Planner estimate off

Michael Lewis <mlewis@xxxxxxxxxxx> · Wed, 21 Oct 2020 09:29:53 -0600

On Wed, Oct 21, 2020, 8:42 AM Mats Olsen <mats@xxxxxxxxxxxxxxxxx> wrote:

    On 10/21/20 2:38 PM, Sebastian Dressler
      wrote:

      Hi Mats,

            On 20. Oct 2020, at 11:37, Mats Julian Olsen
              <mats@xxxxxxxxxxxxxxxxx>
              wrote:

                [...]

                  1) Vanilla plan (16 min) : 
                      https://explain.depesz.com/s/NvDR 
                  2) enable_nestloop=off (4 min): 
                      https://explain.depesz.com/s/buKK 

                  3) enable_nestloop=off;
                    enable_seqscan=off (2 min): 
                      https://explain.depesz.com/s/0WXx  

                  How can I get Postgres not to loop over
                    12M rows?

          I looked at the plans and your config and there are some
          thoughts I'm having:

        - The row estimate is off, as you possibly noticed. This
          can be possibly solved by raising `default_statistics_target`
          to e.g. 2500 (we typically use that) and run ANALYZE

    I've `set default_statistics_target=2500` and ran analyze on both
    tables involved, unfortunately the plan is the same. The columns we
    use for joining here are hashes and we expect very few duplicates in
    the tables. Hence I think extended statistics (storing most common
    values and histogram bounds) aren't useful for this kind of data.
    Would you say the same thing?

Have you checked if ndistinct is roughly accurate? It can be set manually on a column, or set to some value less than one with the calculation depending on reltuples.