Re: GroupAggregate and Integer Arrays

Merlin Moncure <mmoncure@xxxxxxxxx> · Fri, 23 Oct 2015 12:35:21 -0500

On Friday, October 23, 2015, David Osborne <david@xxxxxxxxxxx> wrote:
Hi,

Wondering if anyone could suggest how we could improve the performance of this type of query?
The intensive part is the summing of integer arrays as far as I can see.
We're thinking there's not much we can do to improve performance apart from throw more CPU at it... would love to be proven wrong though!

Query:

  explain (analyse,buffers) 
  select 
  sum(s2.array_a),sum(s2.array_b)
  from mytable s1 left join mytable s2
  on s1.code=s2.code and s1.buyer=s2.seller and s2.seller='XX'
  where s1.buyer='XX'
  group by s1.buyer,s1.code
;

Depesz Explain Link:

http://explain.depesz.com/s/m3XP

                                                                               QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=275573.49..336223.36 rows=2547 width=524) (actual time=1059.340..22946.772 rows=22730 loops=1)
   Buffers: shared hit=113596 read=1020 dirtied=15
   ->  Merge Left Join  (cost=275573.49..278850.09 rows=113560 width=524) (actual time=1058.773..1728.186 rows=240979 loops=1)
         Merge Cond: ((s1.code)::text = (s2.code)::text)
         Join Filter: (s1.buyer = (s2.seller)::bpchar)
         Buffers: shared hit=113596 read=1020 dirtied=15
         ->  Index Only Scan using mytable_buyer_idx on mytable s1  (cost=0.42..1226.06 rows=25465 width=12) (actual time=0.015..35.790 rows=22730 loops=1)
               Index Cond: (buyer = 'XX'::bpchar)
               Heap Fetches: 3739
               Buffers: shared hit=16805 dirtied=1
         ->  Sort  (cost=275573.07..275818.33 rows=98106 width=525) (actual time=1058.736..1141.560 rows=231662 loops=1)
               Sort Key: s2.code
               Sort Method: quicksort  Memory: 241426kB
               Buffers: shared hit=96791 read=1020 dirtied=14
               ->  Bitmap Heap Scan on mytable s2  (cost=12256.28..267439.07 rows=98106 width=525) (actual time=60.330..325.730 rows=231662 loops=1)
                     Recheck Cond: ((seller)::text = 'XX'::text)
                     Filter: ((seller)::bpchar = 'XX'::bpchar)
                     Buffers: shared hit=96791 read=1020 dirtied=14
                     ->  Bitmap Index Scan on mytable_seller_idx  (cost=0.00..12231.75 rows=254844 width=0) (actual time=40.474..40.474 rows=233244 loops=1)
                           Index Cond: ((seller)::text = 'XX'::text)
                           Buffers: shared hit=30 read=1020
 Total runtime: 22968.292 ms
(22 rows)

Table size:

=> select count(*) from mytable;
 count  
--------
 602669
(1 row)

Array types:

# select array_a,array_b from mytable limit 1;
      array_a      |     array_b      
---------------------------+---------------------------
 {0,0,0,0,0,0,0,0,0,0,0,0} | {0,0,0,0,0,0,0,0,0,0,0,0}

Example schema:

# \d mytable
                        Table "public.mytable"
      Column       |         Type          |       Modifiers        
-------------------+-----------------------+------------------------
 buyer             | character(2)          | not null
 code              | character varying(20) | not null
 seller            | character varying(50) | 
 array_a           | integer[]             | 
 array_b           | integer[]             | 
Indexes:
    "mytable_buyer_code_idx" UNIQUE, btree (buyer, code) CLUSTER
    "mytable_buyer_idx" btree (buyer)
    "mytable_code_idx" btree (code)
    "mytable_seller_idx" btree (seller)

Version:

> SELECT version() ;
                                                   version                                                    
--------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2), 64-bit
(1 row)

This is running on an AWS RDS instance.

Thanks for any pointers
-- 
David 

What's physical memory and setting of work_mem?

merlin