Re: RAID arrays and performance

Matthew Wakeling <matthew@xxxxxxxxxxx> · Fri, 19 Sep 2008 15:59:12 +0100 (BST)

On Thu, 18 Sep 2008, Greg Smith wrote:
It's just that our system is doing a lot of bitmap index scans at the 
moment, and it'd help to be able to spread them across the 16 discs in the 
RAID array. It's the bottleneck in our system at the moment.

If you have some specific bitmap index scan test case suggestions you can 
pass along (either publicly or in private to me, I can probably help 
anonymize them), that's one of the things that has been holding this up.

Okay, here's a description of what we're doing. We are importing data from 
a large file (so we don't have a choice on the order that the data comes 
in). For each entry in the source, we have to look up the corresponding 
row in the database, and issue the correct "UPDATE" command according to 
what's in the database and what's in the data source. Long story - it 
isn't feasible to change the overall process.

In order to improve the performance, I made the system look ahead in the 
source, in groups of a thousand entries, so instead of running:

SELECT * FROM table WHERE field = 'something';

a thousand times, we now run:

SELECT * FROM table WHERE field IN ('something', 'something else'...);

with a thousand things in the IN. Very simple query. It does run faster 
than the individual queries, but it still takes quite a while. Here is an 
example query:

SELECT a1_.id AS a1_id, a1_.primaryIdentifier AS a2_ FROM Gene AS a1_ 
WHERE a1_.primaryIdentifier IN ('SPAC11D3.15', 'SPAC11D3.16c', 
'SPAC11D3.17', 'SPAC11D3.18c', 'SPAC15F9.01c', 'SPAC15F9.02', 'SPAC16.01', 
'SPAC18G6.01c', 'SPAC18G6.02c', 'SPAC18G6.04c', 'SPAC18G6.05c', 
'SPAC18G6.06', 'SPAC18G6.07c', 'SPAC18G6.09c', 'SPAC18G6.10', 
'SPAC18G6.11c', 'SPAC18G6.12c', 'SPAC18G6.13', 'SPAC18G6.14c', 
'SPAC18G6.15', 'SPAC1B9.01c', 'SPAC1D4.02c', 'SPAC1D4.03c', 'SPAC1D4.04', 
'SPAC1D4.05c', 'SPAC1D4.07c', 'SPAC1D4.08', 'SPAC1D4.09c', 'SPAC1D4.10', 
'SPAC1D4.11c', 'SPAC1F3.11', 'SPAC23A1.10', 'SPAC23E2.01', 'SPAC23E2.02', 
'SPAC23E2.03c', 'SPAC26A3.02', 'SPAC26A3.03c', 'SPAC26A3.05', 
'SPAC26A3.06', 'SPAC26A3.07c', 'SPAC26A3.08', 'SPAC26A3.09c', 
'SPAC26A3.10', 'SPAC26A3.11', 'SPAC26A3.14c', 'SPAC26A3.15c', 
'SPAC26A3.16', 'SPAC27F1.01c', 'SPAC27F1.03c', 'SPAC27F1.04c', 
'SPAC27F1.05c', 'SPAC27F1.06c', 'SPAC3H8.02', 'SPAC3H8.03', 'SPAC3H8.04', 
'SPAC3H8.05c', 'SPAC3H8.06', 'SPAC3H8.07c', 'SPAC3H8.08c', 'SPAC3H8.09c', 
'SPAC3H8.10', 'SPAC3H8.11', 'SPAC8E11.11', 'SPBC106.15', 'SPBC17G9.10', 
'SPBC24E9.15c', 'WBGene00000969', 'WBGene00003035', 'WBGene00004095', 
'WBGene00016011', 'WBGene00018672', 'WBGene00018674', 'WBGene00018675', 
'WBGene00018676', 'WBGene00018959', 'WBGene00018960', 'WBGene00018961', 
'WBGene00023407') ORDER BY a1_.id LIMIT 2000;

And the corresponding EXPLAIN ANALYSE:

 Limit  (cost=331.28..331.47 rows=77 width=17) (actual time=121.973..122.501 rows=78 loops=1)
   ->  Sort  (cost=331.28..331.47 rows=77 width=17) (actual time=121.968..122.152 rows=78 loops=1)
         Sort Key: id
         Sort Method:  quicksort  Memory: 29kB
         ->  Bitmap Heap Scan on gene a1_  (cost=174.24..328.87 rows=77 width=17) (actual time=114.311..121.705 rows=78 loops=1)
               Recheck Cond: (primaryidentifier = ANY ('{SPAC11D3.15...
               ->  Bitmap Index Scan on gene__key_primaryidentifier  (cost=0.00..174.22 rows=77 width=0) (actual time=44.434..44.434 rows=150 loops=1)
                     Index Cond: (primaryidentifier = ANY ('{SPAC11D3.15,SPAC11D3.16c...
 Total runtime: 122.733 ms
(9 rows)

Although it's probably in the cache, as it took 1073 ms the first time. 
The table has half a million rows, but tables all over the database are 
being accessed, so the cache is shared between several hundred million 
rows.

Postgres executes this query in two stages. First it does a trawl of the 
index (on field), and builds an bitmap. Then it fetches the pages 
according to the bitmap. I can see the second stage being quite easy to 
adapt for fadvise, but the first stage would be a little more tricky. Both 
stages are equally important, as they take a comparable amount of time.

We are running this database on a 16-spindle RAID array, so the benefits 
to our process of fully utilising them would be quite large. I'm 
considering if I can parallelise things a little though.

Alternately, if you'd like to join in on testing this all out more help would 
certainly be welcome.

How would you like me to help?

Matthew

--
What goes up must come down. Ask any system administrator.