Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

Richard Huxton <dev@xxxxxxxxxxxx> · Fri, 22 Jan 2010 18:25:00 +0000

On 22/01/10 18:03, Tory M Blue wrote:
On Fri, Jan 22, 2010 at 1:42 AM, Richard Huxton<dev@xxxxxxxxxxxx>  wrote:

On 21/01/10 22:15, Tory M Blue wrote:

2. If it's mostly tagged=true you are interested in you can always use a
partial index: CREATE INDEX ... (makeid) WHERE tagged
This might be a win even if you need a second index with WHERE NOT tagged.

Partial index doesn't seem to fit here due to the fact that there are 35-40%
Marked True.

Didn't think about creating a second index for false, may give that a shot.

If you're mostly search tagged=true, try the partial index - it'll mean 
the planner is just scanning the index for the one term.

Also, either I've not had enough cofee yet, or a bitmap scan is an odd
choice for only ~ 13000 rows out of 100 million.

  *                "        ->    Bitmap Index Scan on

idx_retargetuserstats_makeidtag  (cost=0.00..360.20 rows=13175 width=0)"*

*                "              Index Cond: ((makeid =
'b1mw-ferman'::text)
AND (tagged = true))"*

Otherwise, see what Craig said.

I'm assuming this isn't the query that is CPU bound for a long time. Unless
your table is horribly bloated, there's no reason for that judging by this
plan.

It is, but not always, only when there are 10K more matches. And the explain
unfortunately is sometimes way high or way low, so the expalin is hit and
miss.

But the same sql that returns maybe 500 rows is pretty fast, it's the return
of 10K+ rows that seems to stall and is CPU Bound.

Hmm - might be able to push that cross-over point up a bit by tweaking 
various costs, but you've got to be careful you don't end up making all 
your other queries worse. It'd be good to figure out what the problem is 
first.

Looking at the query there are four stages:
 1. Scan the index, build a bitmap of heap pages with matching rows
 2. Scan those pages, find the rows that match
 3. Run DISTINCT on the uids
 4. Count them
I wonder if it could be the DISTINCT. What happens with a count(*) or 
count(uid) instead? Also - you might find EXPLAIN ANALYZE more useful 
than straight EXPLAIN here. That will show actual times for each stage.

On Craig's branch of this thread, you say you call it 6000 times with 
different "makeid"s. Any reason why you can't join to a temp table and 
just do it in one query?

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance