Re: Performance on large, append-only tables

"Kevin Grittner" <Kevin.Grittner@xxxxxxxxxxxxxxxxxxxxxxxxx> · Fri, 10 Feb 2012 09:44:02 -0600

David Yeu <david.yeu@xxxxxxxxx> wrote:

> We have indices against the primary key and the group_id.
> Our queries essentially fall into the following cases:
> 
>  * Š WHERE group_id = ? ORDER BY created_at DESC LIMIT 20;
>  * Š WHERE group_id = ? AND id > ? ORDER BY created_at DESC;
>  * Š WHERE group_id = ? AND id < ? ORDER BY created_at DESC LIMIT
>    20;
>  * Š WHERE group_id = ? ORDER BY created_at DESC LIMIT 20 OFFSET
>    ?;
> 
> In human words, we're looking for:
> 
>  * The most recent (20) rows.
>  * The most recent rows after a given `id'.
>  * Twenty rows before a given `id'.
>  * Pages of twenty rows.

The first thing I would try is building an index (perhaps
CONCURRENTLY to avoid disrupting production) on (group_id,
created_at).  It might also be worth creating an index on (group_id,
id, created_at), but that's a less-sure win.

> Originally, this table was part of our primary database, but
> recently we saw queries take upwards of thirty seconds or more to
> complete. Since we're serving web requests, that's basically
> unacceptable, and caused a lot of requests to backup.

With only the indexes you mention, it had to be doing either
complete table scans for each request, or a lot of random access to
rows it didn't need.

> Our interim solution has been to simply carve out a new database
> that hosts only this table, and that has worked to some degree. We
> aren't seeing thirty seconds plus database response times anymore,
> but some queries still take many seconds and the cost of spinning
> up a new master-slave configuration hasn't been cheap.

Well, throwing hardware at something doesn't generally hurt, but
it's not the first solution I would try, especially when the product
you're using has ways to tune performance.

> In the meantime, we're hoping to investigate other ways to
> optimize this table and the queries against it. Heroku's data team
> has suggested balling up these rows into arrays, where a single
> row would represent a group_id, and the data would occupy a single
> column as an array.

Ugh.  You're a long way from needing to give up on the relational
model here.

> And finally, we're also trying out alternative stores, since it
> seems like this data and its retrieval could be well suited to
> document-oriented backends. Redis and DynamoDB are currently the
> best contenders.

Your current use of PostgreSQL is more or less equivalent to driving
a car around in first gear.  You might consider a tuned PostgreSQL
as another alternative store.  :-)

-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance