Re: Avoid sorting when doing an array_agg

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Sun, 04 Dec 2016 02:20:58 +0100

On Sat, 2016-12-03 at 13:08 -0500, Kiriakos Georgiou wrote:
> The array_agg() has nothing to do with it.  It’s the group by.
> Without knowing what you are conceptually trying to accomplish, I
> can’t say much.

It *IS* caused by array_agg(). PostgreSQL can only do HashAggregate
when everything fits into memory, and in this case has to deal with
aggregate states of unknown size, so assumes each state is 1kB IIRC.

Per the plan the group by is expected to produce ~27k groups, so needs
about 30MB for the HashAggregate.  

> On my test 9.4.10 db, a similar example does a HashAggregate, so no
> sorting (google HashAggregate vs GroupAggregate).  But still it’s an
> expensive query because of all the I/O.

The query does almost no I/O, actually. The bitmap heap scan takes only
~230ms, which is not bad considering it produces ~1M rows. The
expensive part here seems to be the sort, but I doubt it's because of
I/O because it only produces temporary files that likely stay in RAM
anyway.

So the sort is probably slow because of CPU, as it compares strings. In
some locales that may be very expensive - not sure which locale is used
in this case, as it was not mentioned. 

> If I wanted to instantly have the user ids for a specific first, last
> name and category combo, I’d maintain a summary table via an insert
> trigger on the users table.
>  

Maybe. The question is whether it'll be a net win - maintenance of the
summary table will not be for free, especially with arrays of ids.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general