Re: Why should such a simple query over indexed columns be so slow?

Alessandro Gagliardi <alessandro@xxxxxxxx> · Mon, 30 Jan 2012 12:35:04 -0800

Well that was a lot faster:
"HashAggregate  (cost=156301.82..156301.83 rows=2 width=26) (actual time=2692.806..2692.807 rows=2 loops=1)"
"  ->  Bitmap Heap Scan on blocks  (cost=14810.54..155828.95 rows=472871 width=26) (actual time=289.828..1593.893 rows=575186 loops=1)"
"        Recheck Cond: (created > '2012-01-29 00:00:00+00'::timestamp with time zone)"
"        Filter: (shared IS FALSE)"
"        ->  Bitmap Index Scan on blocks_created_idx  (cost=0.00..14786.89 rows=550404 width=0) (actual time=277.407..277.407 rows=706663 loops=1)"
"              Index Cond: (created > '2012-01-29 00:00:00+00'::timestamp with time zone)"
"Total runtime: 2693.107 ms"

To answer your (non-)question about Heroku, it's a cloud service, so I don't host PostgreSQL myself. I'm not sure how much I can mess with things like GUC since I don't even have access to the "postgres" database on the server. I am a long time SQL user but new to Postgres so I welcome suggestions on where to start with that sort of thing. Setting enable_seqscan=false made a huge difference, so I think I'll start there.

Thank you very much!
-Alessandro

On Mon, Jan 30, 2012 at 11:24 AM, Claudio Freire <klaussfreire@xxxxxxxxx> wrote:

On Mon, Jan 30, 2012 at 4:13 PM, Alessandro Gagliardi

<alessandro@xxxxxxxx> wrote:

> So, here's the query:

>

> SELECT private, COUNT(block_id) FROM blocks WHERE created > 'yesterday' AND

> shared IS FALSE GROUP BY private

>

> What confuses me is that though this is a largish table (millions of rows)

> with constant writes, the query is over indexed columns of types timestamp

> and boolean so I would expect it to be very fast. The clause where created >

> 'yesterday' is there mostly to speed it up, but apparently it doesn't help

> much.

The number of rows touched is ~0.5M, and is correctly estimated, which

would lead me to believe PG estimates the index plan to be slower.

You could try by executing first "set enable_seqscan=false;" and then

your query with explain analyze again. You'll probably get an index

scan, and you'll see both how it performs and how PG thought it would

perform. Any mismatch between the two probably means you'll have to

change the planner tunables (the x_tuple_cost ones) to better match

your hardware.

> As for Hardware: I'm using Heroku's "Ronin" setup which involves 1.7 GB

> Cache. Beyond that I don't really know.

snip

> As for GUC Settings: Again, I don't know what this is. Whatever Heroku

> defaults to is what I'm using.

And there's your problem. Without knowing/understanding those, you

won't get anywhere. I don't know what Heroku is, but you should find

out both hardware details and PG configuration details.

> As for Maintenance Setup: I let Heroku handle that, so I again, I don't

> really know. FWIW though, vacuuming should not really be an issue (as I

> understand it) since I don't really do any updates or deletions. It's pretty

> much all inserts and selects.

Maintainance also includes analyzing the table, to gather stats that

feed the optimizer, and it's very important to keep the stats

accurate. You can do it manually - just perform an ANALYZE. However,

the plan doesn't show any serious mismatch between expected and actual

rowcounts, which suggests stats aren't your problem.