Re: Inconsistent query performance based on relation hit frequency

Achilleas Mantzios <a.mantzios@xxxxxxxxxxxxxxxxxxxx> · Thu, 27 Jun 2024 22:22:34 +0300



    Στις 27/6/24 17:58, ο/η Laura Hausmann
      έγραψε:

    
                Heya & thank you for the response!

                  
                That makes a lot of sense. I'm glad to hear it's on the
                radar of the team, but I understand that this is a
                complex task and won't happen anytime soon.

                
                For the meantime, I've tried a couple ways of rewriting
                the query, sadly none of which seem to translate to the
                production database:

              
              Simply dropping the or/union clause (and adding a
              relationship to the user themselves) fixes the problem in
              the test database (both user 1 (https://explain.depesz.com/s/ZY8l)
              and user 4 (https://explain.depesz.com/s/Q2Wk)
              run in 1~15ms, which isn't perfect but good enough), but
              not the production one (still fast for high frequency (https://explain.depesz.com/s/DixF)
              and slow for low frequency (https://explain.depesz.com/s/fIKm)
              users).

              
            I also tried rewriting it as a join (https://explain.depesz.com/s/36Ve),
            but that also didn't seem to have an effect.

            
          It's very possible I missed one or multiple ways the
            query could be rewritten in.

          
        I'm sadly not sure how I could generate a test dataset that more
        closely resembles the production workload. In case that would be
        helpful in debugging this further, any tips on that would be
        greatly appreciated.

      
    I am not sure my message made it through to you, I dont know if
      you are subscribed to the list, here is an idea :
    select o.* from objects o where o."userId" = :userid UNION select
      o.* from objects o where o."userId" IN 
    (SELECT r."followeeId" FROM relationships r WHERE r."followerId"=:userid)
      ORDER BY id DESC ; 

    
    With your test data I get <= 1ms answers with all inputs.

    
        Thanks in advance,

        
                        Laura Hausmann
                      
                    
        On Thu, Jun 27, 2024 at
          12:31 PM Andrei Lepikhov <lepihov@xxxxxxxxx>
          wrote:

        
        On
          6/27/24 07:50, Laura Hausmann wrote:

          > I'd appreciate any and all input on the situation. If
          I've left out any 

          > information that would be useful in figuring this out,
          please tell me.

          Thanks for this curious case, I like it!

          At first, you can try to avoid "OR" expressions - PostgreSQL
          has quite 

          limited set of optimisation/prediction tricks on such
          expressions.

          Second - I see, postgres predicts wrong number of tuples. But
          using my 

          typical tool [1] and getting more precise estimations i don't
          see 

          significant profit:

          
            Limit  (cost=10832.85..10838.69 rows=50 width=21)

              ->  Gather Merge  (cost=10832.85..10838.92 rows=52
          width=21)

                    Workers Planned: 2

                    Workers Launched: 2

                    ->  Sort  (cost=9832.83..9832.90 rows=26
          width=21)

                          Sort Key: objects.id
          DESC

                          Sort Method: top-N heapsort  Memory: 32kB

                          Worker 0:  Sort Method: quicksort  Memory:
          32kB

                          Worker 1:  Sort Method: quicksort  Memory:
          32kB

                          ->  Parallel Seq Scan on objects

                                Filter: ((hashed SubPlan 1) OR ("userId"
          = 1))

                                Rows Removed by Filter: 183372

                                SubPlan 1

                                  ->  Nested Loop

                                        ->  Index Only Scan using
          users_pkey on

                                              Index Cond: (id = 1)

                                              Heap Fetches: 0

                                        ->  Index Only Scan using 

          "relationships_followerId_followeeId_idx" on relationships

                                              Index Cond: ("followerId"
          = 1)

                                              Heap Fetches: 0

            Planning Time: 0.762 ms

            Execution Time: 43.816 ms

          
            Limit  (cost=10818.83..10819.07 rows=2 width=21)

              ->  Gather Merge  (cost=10818.83..10819.07 rows=2
          width=21)

                    Workers Planned: 2

                    Workers Launched: 2

                    ->  Sort  (cost=9818.81..9818.81 rows=1 width=21)

                          Sort Key: objects.id
          DESC

                          Sort Method: quicksort  Memory: 25kB

                          Worker 0:  Sort Method: quicksort  Memory:
          25kB

                          Worker 1:  Sort Method: quicksort  Memory:
          25kB

                          ->  Parallel Seq Scan on objects

                                Filter: ((hashed SubPlan 1) OR ("userId"
          = 4))

                                Rows Removed by Filter: 183477

                                SubPlan 1

                                  ->  Nested Loop  (cost=0.56..8.61
          rows=1 width=4)

                                        ->  Index Only Scan using 

          "relationships_followerId_followeeId_idx" on relationships

                                              Index Cond: ("followerId"
          = 4)

                                              Heap Fetches: 0

                                        ->  Index Only Scan using
          users_pkey

                                              Index Cond: (id = 4)

                                              Heap Fetches: 0

            Planning Time: 0.646 ms

            Execution Time: 30.824 ms

          
          But this was achieved just because of parallel workers
          utilisation. 

          Disabling them we get:

          
            Limit  (cost=14635.07..14635.08 rows=2 width=21) (actual 

          time=75.941..75.943 rows=0 loops=1)

              ->  Sort  (cost=14635.07..14635.08 rows=2 width=21)
          (actual 

          time=75.939..75.940 rows=0 loops=1)

                    Sort Key: objects.id
          DESC

                    Sort Method: quicksort  Memory: 25kB

                    ->  Seq Scan on objects  (cost=8.61..14635.06
          rows=2 width=21) 

          (actual time=75.931..75.932 rows=0 loops=1)

                          Filter: ((hashed SubPlan 1) OR ("userId" = 4))

                          Rows Removed by Filter: 550430

                          SubPlan 1

                            ->  Nested Loop  (cost=0.56..8.61 rows=1
          width=4) 

          (actual time=0.039..0.040 rows=0 loops=1)

                                  ->  Index Only Scan using 

          "relationships_followerId_followeeId_idx" on relationships 

          (cost=0.28..4.29 rows=1 width=8) (actual time=0.038..0.038
          rows=0 loops=1)

                                        Index Cond: ("followerId" = 4)

                                        Heap Fetches: 0

                                  ->  Index Only Scan using
          users_pkey on users 

          (cost=0.29..4.31 rows=1 width=4) (never executed)

                                        Index Cond: (id = 4)

                                        Heap Fetches: 0

            Planning Time: 0.945 ms

            Execution Time: 76.123 ms

          
          So, from the optimiser's point of view, it has done the best
          it could.

          Theoretically, if you have a big table with indexes and must
          select a 

          small number of tuples, the ideal query plan will include
          parameterised 

          NestLoop JOINs. Unfortunately, parameterisation in PostgreSQL
          can't pass 

          inside a subquery. It could be a reason for new development
          because 

          MSSQL can do such a trick, but it is a long way.

          You can try to rewrite your schema and query to avoid
          subqueries in 

          expressions at all.

          I hope this message gave you some insights.

          
          [1] https://github.com/postgrespro/aqo

          
          -- 

          regards, Andrei Lepikhov

          
    -- 
Achilleas Mantzios
 IT DEV - HEAD
 IT DEPT
 Dynacom Tankers Mgmt (as agents only)