Re: Joining 16 tables seems slow

Chris Travers <chris.travers@xxxxxxxxx> · Tue, 12 Sep 2017 15:36:21 +0200

On Tue, Sep 12, 2017 at 3:15 PM, Frank Millman <frank@xxxxxxxxxxxx> wrote:

2017-09-12 12:39 GMT+02:00 Pavel Stehule <pavel.stehule@xxxxxxxxx>:

  2017-09-12 12:25 GMT+02:00 Frank 
  Millman <frank@xxxxxxxxxxxx>:

    Pavel Stehule wrote:

    2017-09-12 9:36 GMT+02:00 Frank Millman <frank@xxxxxxxxxxxx>:

      Pavel 
      Stehule wrote:

      > 
      > 2017-09-12 8:45 GMT+02:00 Frank Millman <frank@xxxxxxxxxxxx>:

        I am using 9.4.4 on Fedora 22.

        I am experimenting with optimising a SQL statement. One version 
        uses 4 LEFT JOIN’s and a 5-way CASE statement in the body. The second 
        moves the filtering into the JOIN section, and I end up with 16 LEFT 
        JOIN’s and no CASE statements.

        My test involves selecting a single row. Both versions work. The 
        first version takes 0.06 seconds. The second takes 0.23 seconds. On 
        further experimentation, the time for the second one seems to taken in 
        setting up the joins, because if I omit selecting anything from the 
        joined tables, it still takes 0.23 
      seconds.
      > 
      > please send result of explain analyze
      > 
      > you can experimentally try increase FROM_COLLAPSE_LIMIT to some 
      higher number 14 maybe 16
      > 
      I tried increasing FROM_COLLAPSE_LIMIT, but it made no 
      difference.

      I have attached files containing my SQL command, and the results of 
      EXPLAIN ANALYSE

    > please use https://explain.depesz.com/ for both plans (slow, 
    fast)

    Here are the results - 

    sql_slow - https://explain.depesz.com/s/9vn3

    sql_fast - https://explain.depesz.com/s/oW0F

  I don't see any issue there - it looks like some multi dimensional query 
  and it should not be well optimized due not precious estimations. The slow 
  query has much more complex - some bigger logic is under nested loop - where 
  estimation is not fully correct, probably due dependencies between 
  columns.

  what does SET enable_nestloop to off;
> 
> from statistics - the ar_tran_inv table is scanned 6x in slow query 
and 2times in fast query. Maybe there should be some index 
>

Setting enable_nestloop to off makes no difference.

Setting from_collapse_limit and join_collapse_limit to 16, as suggested by 
Tom, actually slowed it down.

I mentioned before that I was running this from python, which complicated 
it slightly. I have now saved the command to a file on the Fedora side, so I can 
execute it in psql using the ‘\i’ command. It makes life easier, and I can use 
‘\timing’ to time it. It shows exactly the same results.

It could be an index problem, but I have just double-checked that, if I 
remove the lines from the body of the statement that actually select from the 
joined tables, it makes virtually no difference. However, maybe the planner 
checks to see what indexes it has before preparing the query, so that does not 
rule it out as a possibility.

I will play with it some more tomorrow, when my  brain is a bit 
fresher. I will report back with any results.

I am not convinced that the nested loop is a problem here.  I cannot think of a faster join plan than a nested loop when you only have one iteration of the loop (and looking through I did not see any loop counts above 1).

If you read and count ms carefully you will find that ar_tran_inv is scanned 6 times and each of these times is taking about 25ms.  25x6 is half of your query time right there and then you have the overhead in the joins on top of that.  Quick eyeball estimates is that this is where approx 200ms of your query time comes from.  Looking at this in more detail it doesn't look 

This is not a problem with too many tables in the join but the fact that you are joining the same tables in multiple times in ways you end up needing to repeatedly sequentially scan them.

I also don't think an index is going to help unless you have accounting data going way back (since you are looking for about a year's worth of data) or unless 90% of your transactions get marked as deleted.  So I think you are stuck with the sequential scans on this table and optimizing will probably mean reducing the number of times you scan that table.

Frank

-- 
Best Wishes,Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.
http://www.efficito.com/learn_more