Re: Inconsistent query performance based on relation hit frequency

Achilleas Mantzios - cloud <a.mantzios@xxxxxxxxxxxxxxxxxxxx> · Thu, 27 Jun 2024 16:27:54 +0300



    On 6/27/24 03:50, Laura Hausmann wrote:

    
          Heya, I hope the title is somewhat descriptive. I'm
            working on a decentralized social media platform and have
            encountered the following performance issue/quirk, and would
            like to ask for input, since I'm not sure I missed anything.

            
          I'm running PostgreSQL 16.2 on x86_64-pc-linux-gnu, compiled
          by gcc (GCC) 13.2.1 20230801, 64-bit, running on an Arch Linux
          box with 128GB of RAM & an 8c16t Ryzen 3700X CPU. Disk is
          a NVME RAID0.

          
        Postgres configuration: https://paste.depesz.com/s/iTv
        

        I'm using autovacuum defaults & am running a manual
          VACUUM ANALYZE on the entire database nightly.

        
        The relevant database parts consist of a table with posts
          (note), a table with users (user), and a table with follow
          relationships (following). The query in question takes the
          most recent n (e.g. 50) posts, filtered by the users follow
          relations.

          
        The note table on my main production instance grows by
          about 200k entries per week.

          
        Schema & tuple counts: https://paste.depesz.com/s/cfI
        

        Here's the shortest query I can reproduce the issue with: https://paste.depesz.com/s/RoC

          Specifically, it works well for users that follow a relatively
          large amount of users (https://explain.depesz.com/s/tJnB),
          and is very slow for users that follow a low amount of users /
          users that post infrequently (https://explain.depesz.com/s/Mtyr).

        
        From what I can tell, this is because this query causes
          postgres to scan the note table from the bottom (most recent
          posts first), discarding anything by users that are not
          followed.

        
          Curiously, rewriting the query like this (https://paste.depesz.com/s/8rN)
          causes the opposite problem, this query is fast for users with
          a low following count (https://explain.depesz.com/s/yHAz#query),
          and slow for users with a high following count (https://explain.depesz.com/s/1v6L,
          https://explain.depesz.com/s/yg3N).

          
        These numbers are even further apart (to the point of
          10-30s query timeouts) in the most extreme outlier cases I've
          observed, and on lower-end hardware.

          
        I've sidestepped the issue by running either of these
          queries based on a heuristic that checks whether there are
          more than 250 matching posts in the past 7 days, recomputed
          once per day for every user, but it feels more like a hack
          than a proper solution.

          
          I'm able to make the planner make a sensible decision in
            both cases by setting enable_sort = off, but that tanks
            performance for the rest of my application, is even more of
            a hack, and doesn't seem to work in all cases.

          
        I've been able to reproduce this issue with mock data (https://paste.depesz.com/s/CnY),
          though it's not generating quite the same query plans and is
          behaving a bit differently.

        
    Before deep dive into everybody's favorite topic you may simplify
      your query :
    select o.*
          from objects o where o."userId" = :userid UNION select o.*
          from objects o where o."userId" IN 

        
    (SELECT
          r."followeeId" FROM relationships r WHERE
        r."followerId"= :userid)

      
    postgres@[local]/laura=#
          explain (analyze, buffers) select o.* from objects o where
          o."userId" = 1 UNION select o.* from objects o where
          o."userId" IN (SELECT r."followeeId" FROM relati

        onships r WHERE r."followerId"=1) ORDER BY id DESC ;
        

                                                                                        QUERY
        PLAN
                                                                                     

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

        ---
        

        Sort  (cost=8622.04..8767.98 rows=58376 width=40) (actual
        time=1.041..1.053 rows=314 loops=1)
        

          Sort Key: o.id DESC
        

          Sort Method: quicksort  Memory: 39kB
        

          Buffers: shared hit=1265
        

          ->  HashAggregate  (cost=3416.92..4000.68 rows=58376
        width=40) (actual time=0.900..1.006 rows=314 loops=1)
        

                Group Key: o.id, o."userId", o.data
        

                Batches: 1  Memory Usage: 1585kB
        

                Buffers: shared hit=1265
        

                ->  Append  (cost=0.42..2979.10 rows=58376 width=40)
        (actual time=0.024..0.816 rows=314 loops=1)
        

                      Buffers: shared hit=1265
        

                      ->  Index Scan using "objects_userId_idx" on
        objects o  (cost=0.42..3.10 rows=17 width=21) (actual
        time=0.003..0.003 rows=0 loops=1)
        

                            Index Cond: ("userId" = 1)
        

                            Buffers: shared hit=3
        

                      ->  Nested Loop  (cost=0.70..2684.12 rows=58359
        width=21) (actual time=0.020..0.794 rows=314 loops=1)
        

                            Buffers: shared hit=1262
        

                            ->  Index Only Scan using
        "relationships_followerId_followeeId_idx" on relationships r
         (cost=0.28..7.99 rows=315 width=4) (actual time=0.011..0.030
        rows=315 loops=

        1)
        

                                  Index Cond: ("followerId" = 1)
        

                                  Heap Fetches: 0
        

                                  Buffers: shared hit=3
        

                            ->  Index Scan using "objects_userId_idx"
        on objects o_1  (cost=0.42..6.65 rows=185 width=21) (actual
        time=0.002..0.002 rows=1 loops=315)
        

                                  Index Cond: ("userId" =
        r."followeeId")
        

                                  Buffers: shared hit=1259
        

        Planning:
        

          Buffers: shared hit=8
        

        Planning Time: 0.190 ms
        

        Execution Time: 1.184 ms
        

        (26 rows)
        

        Time: 1.612 ms
        

        postgres@[local]/laura=# explain (analyze, buffers) select o.*
        from objects o where o."userId" = 4 UNION select o.* from
        objects o where o."userId" IN (SELECT r."followeeId" FROM relati

        onships r WHERE r."followerId"=4) ORDER BY id DESC ;
        

                                                                                      QUERY
        PLAN
                                                                                       

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        

        Sort  (cost=27.53..28.03 rows=202 width=40) (actual
        time=0.015..0.016 rows=0 loops=1)
        

          Sort Key: o.id DESC
        

          Sort Method: quicksort  Memory: 25kB
        

          Buffers: shared hit=5
        

          ->  HashAggregate  (cost=17.77..19.79 rows=202 width=40)
        (actual time=0.013..0.013 rows=0 loops=1)
        

                Group Key: o.id, o."userId", o.data
        

                Batches: 1  Memory Usage: 40kB
        

                Buffers: shared hit=5
        

                ->  Append  (cost=0.42..16.26 rows=202 width=40)
        (actual time=0.011..0.012 rows=0 loops=1)
        

                      Buffers: shared hit=5
        

                      ->  Index Scan using "objects_userId_idx" on
        objects o  (cost=0.42..3.10 rows=17 width=21) (actual
        time=0.005..0.005 rows=0 loops=1)
        

                            Index Cond: ("userId" = 4)
        

                            Buffers: shared hit=3
        

                      ->  Nested Loop  (cost=0.70..12.14 rows=185
        width=21) (actual time=0.005..0.005 rows=0 loops=1)
        

                            Buffers: shared hit=2
        

                            ->  Index Only Scan using
        "relationships_followerId_followeeId_idx" on relationships r
         (cost=0.28..1.39 rows=1 width=4) (actual time=0.005..0.005
        rows=0 loops=1)
        

                                  Index Cond: ("followerId" = 4)
        

                                  Heap Fetches: 0
        

                                  Buffers: shared hit=2
        

                            ->  Index Scan using "objects_userId_idx"
        on objects o_1  (cost=0.42..8.90 rows=185 width=21) (never
        executed)
        

                                  Index Cond: ("userId" =
        r."followeeId")
        

        Planning:
        

          Buffers: shared hit=8
        

        Planning Time: 0.201 ms
        

        Execution Time: 0.048 ms
        

        (25 rows)
        

        Time: 0.490 ms

        
              I'd appreciate any and all input on the situation. If
                I've left out any information that would be useful in
                figuring this out, please tell me.

              
              Thanks in advance,

              
                      Laura Hausmann