Re: to_jsonb performance on array aggregated correlated subqueries

Nico Heller <nico.heller@xxxxxxxxx> · Fri, 12 Aug 2022 19:18:33 +0000



    Am 12.08.2022 um 21:15 schrieb Rick
      Otten:

    
          On Fri, Aug 12, 2022 at 3:07
            PM Nico Heller <nico.heller@xxxxxxxxx>
            wrote:

          
              Am 12.08.2022 um 21:02 schrieb Rick Otten:

              
                    On Fri, Aug 12,
                      2022 at 2:50 PM Nico Heller <nico.heller@xxxxxxxxx>
                      wrote:

                    
                    Good day,

                      
                      consider the following query:

                      
                      WITH aggregation(

                           SELECT

                                  a.*,

                                 (SELECT array_agg(b.*) FROM b WHERE
                      b.a_id = a.id)
                      as "bs",

                                 (SELECT array_agg(c.*) FROM c WHERE
                      c.a_id = a.id)
                      as "cs",

                                 (SELECT array_agg(d.*) FROM d WHERE
                      d.a_id = a.id)
                      as "ds",

                                 (SELECT array_agg(e.*) FROM d WHERE
                      e.a_id = a.id)
                      as "es"

                           FROM a WHERE a.id IN (<some big
                      list, ranging from 20-180 entries)

                      )

                      SELECT to_jsonb(aggregation.*) as "value" FROM
                      aggregation;

                      
                    - You do have an index on `b.a_id` and
                      `c.a_id`, etc... ?  You didn't say...
                  
                
              Yes there are indices on all referenced columns of the
              subselect (they are all primary keys anyway)

              
                    - Are you sure it is the `to_jsonb` that is
                      making this query slow?
                  
                
              Yes, EXPLAIN ANALYZE shows a doubling of execution time -
              I don't have numbers on the memory usage difference though

              
                    - Since you are serializing this for easy
                      machine readable consumption outside of the
                      database, does it make a difference if you use
                      `to_json` instead?
                    

              Using to_json vs. to_jsonb makes no difference in
                regards to runtime, I will check if the memory
                consumption is different on monday - thank you for the
                idea!

              
          One other thought.  Does it help if you convert the
            arrays to json first before you convert the whole row?  ie,
            add some to_json()'s around the bs, cs, ds, es columns in
            the CTE.  I'm wondering if breaking the json conversions up
            into smaller pieces will let the outer to_json() have less
            work to do and overall run faster.  You could even
            separately serialize the elements inside the array too.  I
            wouldn't think it would make a huge difference, you'd be
            making a bunch of extra to_json calls, but maybe it avoids
            some large memory structure that would otherwise have to be
            constructed to serialize all of those objects in all of the
            arrays all at the same time.
        
      
    Using jsonb_array_agg and another to_jsonb at the (its still
      needed to create one value at the end and to include the columns
      "a.*") worsens the query performance by 100%, I can't speak for
      the memory usage because I would have to push these changes to
      preproduction - will try this on monday, thanks.