Avoid sorting when doing an array_agg

Alexis Woo <awoo2611@xxxxxxxxx> · Fri, 2 Dec 2016 18:25:07 +0100

I have a users table which contains ~70 million rows that looks like this:

   Column    |       Type        |
-------------+-------------------+
 id          | integer           |
 first_name  | character varying |
 last_name   | character varying |
 category_id | integer           |
Indexes:
    "users_id_idx" btree (id)
    "users_category_id_first_name_last_name_idx" btree (category_id, first_name, last_name)

I'm trying to retrieve the ids for each (first_name, last_name) couple for one specific category_id.
The query that I'm currently doing is the following:

select array_agg(id)
from users
where category_id = 5432
group by first_name, last_name;

For which the explain analyze output is the following:

 GroupAggregate  (cost=618461.35..626719.42 rows=26881 width=19) (actual time=1683.139..2613.386 rows=102943 loops=1)
   Group Key: first_name, last_name
   ->  Sort  (cost=618461.35..620441.86 rows=792206 width=19) (actual time=1683.116..2368.904 rows=849428 loops=1)
         Sort Key: first_name, last_name
         Sort Method: external merge  Disk: 25304kB
         ->  Bitmap Heap Scan on users  (cost=26844.16..524595.92 rows=792206 width=19) (actual time=86.046..229.469 rows=849428 loops=1)
               Recheck Cond: (category_id = 5432)
               Heap Blocks: exact=7938
               ->  Bitmap Index Scan on users_category_id_first_name_last_name_idx  (cost=0.00..26646.11 rows=792206 width=0) (actual time=85.006..85.006 rows=849428 loops=1)
                     Index Cond: (category_id = 5432)

What seems to greatly decrease the performance of the query is the "Sort Method: external merge Disk: 7526kB."

Is it possible to aggregate the ids without doing a sort ?
If not, what other options, apart from increasing the work_mem, do I have ?

Thanks,

Alexis