Scaling SELECT:s with the number of disks on a stripe

Peter Schuller <peter.schuller@xxxxxxxxxxxx> · Fri, 30 Mar 2007 07:16:45 +0200

Hello,

I am looking to use PostgreSQL for storing some very simple flat data
mostly in a single table. The amount of data will be in the hundreds
of gigabytes range. Each row is on the order of 100-300 bytes in size;
in other words, small enough that I am expecting disk I/O to be seek
bound (even if PostgreSQL reads a full pg page at a time, since a page
is significantly smaller than the stripe size of the volume).

The only important performance characteristics are insertion/deletion
performance, and the performance of trivial SELECT queries whose WHERE
clause tests equality on one of the columns.

Other than absolute performance, an important goal is to be able to
scale fairly linearly with the number of underlying disk drives. We
are fully willing to take a disk seek per item selected, as long as it
scales.

To this end I have been doing some benchmarking to see whether the
plan is going to be feasable. On a 12 disk hardware stripe, insertion
performance does scale somewhat with concurrent inserters. However, I
am seeing surprising effects with SELECT:s: a single selecter
generates the same amount of disk activity as two concurrent selecters
(I was easily expecting about twice).

The query is simple:

SELECT * FROM test WHERE value = 'xxx' LIMIT 1000;

No ordering, no joins, no nothing. Selecting concurrently with two
different values of 'xxx' yields the same amount of disk activity
(never any significant CPU activity). Note that the total amount of
data is too large to fit in RAM (> 500 million rows), and the number
of distinct values in the value column is 10000. The column in the
WHERE clause is indexed.

So my first question is - why am I not seeing this scaling? The
absolute amount of disk activity with a single selecter is consistent
with what I would expect from a SINGLE disk, which is completely
expected since I never thought PostgreSQL would introduce disk I/O
concurrency on its own. But this means that adding additional readers
doing random-access reads *should* scale very well with 12 underlying
disks in a stripe.

(Note that I have seen fairly similar results on other RAID variants
too, including software RAID5 (yes yes I know), in addition to the
hardware stripe.)

These tests have been done Linux 2.6.19.3 and PostgreSQL 8.1.

Secondly, I am seeing a query plan switch after a certain
threshold. Observe:

perftest=# explain select * from test where val='7433' limit 1000; 
                                       QUERY PLAN                                        
-----------------------------------------------------------------------------------------
 Limit  (cost=0.00..4016.50 rows=1000 width=143)
   ->  Index Scan using test_val_ix on test  (cost=0.00..206620.88 rows=51443 width=143)
         Index Cond: ((val)::text = '7433'::text)
(3 rows)

Now increasing to a limit of 10000:

perftest=# explain select * from test where val='7433' limit 10000;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Limit  (cost=360.05..38393.36 rows=10000 width=143)
   ->  Bitmap Heap Scan on test  (cost=360.05..196014.82 rows=51443 width=143)
         Recheck Cond: ((val)::text = '7433'::text)
         ->  Bitmap Index Scan on test_val_ix  (cost=0.00..360.05 rows=51443 width=0)
               Index Cond: ((val)::text = '7433'::text)
(5 rows)

The interesting part is that the latter query is entirely CPU bound
(no disk I/O at all) for an extended period of time before even
beginning to read data from disk. And when it *does* start performing
disk I/O, the performance is about the same as for the other case. In
other words, the change in query plan seems to do nothing but add
overhead.

What is the bitmap heap scan supposed to be doing that would increase
performance above a "seek once per matching row" plan? I haven't been
able to Google my way to what the intended benefit is of a heap scan
vs. a plain index scan.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@xxxxxxxxxxxx>'
Key retrieval: Send an E-Mail to getpgpkey@xxxxxxxxx
E-Mail: peter.schuller@xxxxxxxxxxxx Web: http://www.scode.org

Attachment:
pgpbtTLvGfqhx.pgp

Description: PGP signature