Kevin Galligan wrote, On 29-10-08 23:35:
An example of a slow query is...
select count(*) from bigdatatable where age between 22 and 40 and state
= 'NY';
explain analyze returned the following...
Aggregate (cost=5179639.55..5179639.56 rows=1 width=0) (actual
time=389529.895..389529.897 rows=1 loops=1)
-> Bitmap Heap Scan on bigdatatable (cost=285410.65..5172649.63
rows=2795968 width=0) (actual time=6727.848..387159.175
rows=2553273 loops=1)
Recheck Cond: ((state)::text = 'NY'::text)
Filter: ((age >= 22) AND (age <= 40))
-> Bitmap Index Scan on idx_jstate (cost=0.00..284711.66
rows=15425370 width=0) (actual time=6298.950..6298.950
ro ws=16821828 loops=1)
Index Cond: ((state)::text = 'NY'::text)
Total runtime: 389544.088 ms
It looks like the index scans are around 6 seconds or so each, but then
the bitmap heap scan and aggregate jump up to 6 mintues.
Indeed. Its cause is that PostGreSQL must traverse the data in order to
verify if the data is valid for the transaction. This means A LOT of
data must be retrieved from disk.
The only real thing you can do is reduce I/O load, by reducing the
amount of data that must be traversed (or ensuring the data is stored
closely together, but thats really hard to get right). This requires
optimizing your database design for that single goal.
This will not make it scale any better than it currently does, however.
The query will scale O(N) with the size of your table, you want other
techniques to do better.
Another thing is spending extra money on hardware that can sustain
higher I/O seek rates (more and/or faster spindles).
- Joris
More detail on the table design and other stuff in a bit...
On Wed, Oct 29, 2008 at 6:18 PM, Scott Marlowe <scott.marlowe@xxxxxxxxx
<mailto:scott.marlowe@xxxxxxxxx>> wrote:
On Wed, Oct 29, 2008 at 2:18 PM, Kevin Galligan <kgalligan@xxxxxxxxx
<mailto:kgalligan@xxxxxxxxx>> wrote:
> I'm approaching the end of my rope here. I have a large database.
> 250 million rows (ish). Each row has potentially about 500 pieces of
> data, although most of the columns are sparsely populated.
A couple of notes here. PostgreSQL stores null values as a single bit
in a bit field, making sparsely populated tables quite efficient as
long as you store the non-existent values as null and not '' or some
other real value.
Have you run explain analyze on your queries yet? Pick a slow one,
run explain analyze on it and post it and we'll see what we can do.
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general