Re: Expected performance of querying 5k records from 4 million records?

Josh Berkus <josh@xxxxxxxxxxxx> · Fri, 15 Jun 2012 10:44:45 -0700

Anish,

> I've been struggling with this issue for the last several days, and I feel
> like I'm running into a few different issues that I don't understand.  I'm
> using postgres 9.0.8, and here's the OS I'm running this on:
> inux  2.6.18-308.4.1.el5xen #1 SMP Tue Apr 17 17:49:15 EDT 2012 x86_64
> x86_64 x86_64 GNU/Linux

RAM?  What does your disk support look like? (disks especially are
relevant, see below).

> explain analyze select data_id, dataset_id, stat from data_part_201 where
> dataset_id = 201
> and stat_id = 6 and data_id>=50544630 and data_id<=50549979;
> 
>  Bitmap Heap Scan on data_part_201  (cost=115.79..14230.69 rows=4383
> width=16) (actual time=36.103..1718.141 rows=5350 loops=1)
>    Recheck Cond: ((data_id >= 50544630) AND (data_id <= 50549979))
>    Filter: ((dataset_id = 201) AND (stat_id = 6))
>    ->  Bitmap Index Scan on data_unq_201  (cost=0.00..114.70 rows=5403
> width=0) (actual time=26.756..26.756 rows=5350 loops=1)
>          Index Cond: ((data_id >= 50544630) AND (data_id <= 50549979))
>  Total runtime: 1728.447 ms
> (6 rows)

I've seen extremely slow Bitmap Heap Scans like this before.  There's a
few things which can cause them in my experience:

1) Table is on disk, and random access to disk is very slow for some reason.

2) Recheck condition is computationally expensive (unlikely here)

3) Index is very bloated and needs reindexing (again, unlikely because
the initial Bitmap Index Scan is quite fast).

To test the above: run the exact same query several times in a row.
Does it get dramatically faster on the 2nd and successive runs?

>  Index Scan using data_part_202_dataset_regionset_data_idx on data_part_202
>  (cost=0.00..7987.83 rows=4750 width=16) (actual time=39.152..27339.401
> rows=5350 loops=1)
>    Index Cond: ((dataset_id = 202) AND (data_id >= 50544630) AND (data_id
> <= 50549979) AND (stat_id = 6))
>  Total runtime: 27349.091 ms
> (3 rows)

I'll point out that you're now querying a different partition than you
did above.

Again, this would point to random access to the underlying partition
being very slow.

> QUESTION 3:
> If I do the following:  reindex table data_part_204 the query now takes
> 50-70 milliseconds.  Is this because the table is getting cached?  How do I
> know if a particular query is coming from the cache? The reason why I think
> "reindex table" is caching the results, is that select count(*) from the
> partition also causes the query to be fast.

Yes, it's most likely because the table is being cached.  To test this,
run one of the slow query versions above repeatedly.

Things to investigate:

1) Is there some reason why random access on your disks would be
unusually slow?  iSCSI, cheap NAS/SAN, RAID 5+0, running on OSX or
Windows, etc.?

2) Is there a possibility that the partitions or indexes involved might
be unusually bloated, such as a large number of historical updates to
indexed columns?  If so, does a CLUSTER on one partition make the issue
go away?

3) Test your database using PostgreSQL 9.2 Beta2.  Do the new index-only
scans solve this issue?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance