The array_agg() has nothing to do with it. It’s the group by. Without knowing what you are conceptually trying to accomplish, I can’t say much. On my test 9.4.10 db, a similar example does a HashAggregate, so no sorting (google HashAggregate vs GroupAggregate). But still it’s an expensive query because of all the I/O. If I wanted to instantly have the user ids for a specific first, last name and category combo, I’d maintain a summary table via an insert trigger on the users table. Kiriakos From: <pgsql-general-owner@xxxxxxxxxxxxxx> on behalf of Alexis Woo <awoo2611@xxxxxxxxx> I have a users table which contains ~70 million rows that looks like this: Column | Type | -------------+-------------------+ id | integer | first_name | character varying | last_name | character varying | category_id | integer | Indexes: "users_id_idx" btree (id) "users_category_id_first_name_last_name_idx" btree (category_id, first_name, last_name) I'm trying to retrieve the ids for each (first_name, last_name) couple for one specific category_id. The query that I'm currently doing is the following: select array_agg(id) from users where category_id = 5432 group by first_name, last_name; For which the explain analyze output is the following: GroupAggregate (cost=618461.35..626719.42 rows=26881 width=19) (actual time=1683.139..2613.386 rows=102943 loops=1) Group Key: first_name, last_name -> Sort (cost=618461.35..620441.86 rows=792206 width=19) (actual time=1683.116..2368.904 rows=849428 loops=1) Sort Key: first_name, last_name Sort Method: external merge Disk: 25304kB -> Bitmap Heap Scan on users (cost=26844.16..524595.92 rows=792206 width=19) (actual time=86.046..229.469 rows=849428 loops=1) Recheck Cond: (category_id = 5432) Heap Blocks: exact=7938 -> Bitmap Index Scan on users_category_id_first_name_last_name_idx (cost=0.00..26646.11 rows=792206 width=0) (actual time=85.006..85.006 rows=849428 loops=1) Index Cond: (category_id = 5432) What seems to greatly decrease the performance of the query is the "Sort Method: external merge Disk: 7526kB." Is it possible to aggregate the ids without doing a sort ? If not, what other options, apart from increasing the work_mem, do I have ? |