Re: RAID arrays and performance

Mark Mielke <mark@xxxxxxxxxxxxxx> · Tue, 04 Dec 2007 08:45:24 -0500

Matthew wrote:
On Tue, 4 Dec 2007, Gregory Stark wrote:

"Matthew" <matthew@xxxxxxxxxxx> writes
Does Postgres issue requests to each random access in turn, waiting for
each one to complete before issuing the next request (in which case the
performance will not exceed that of a single disc), or does it use some
clever asynchronous access method to send a queue of random access
requests to the OS that can be distributed among the available discs?

Sorry, it does the former, at least currently.
That said, this doesn't really come up nearly as often as you might think.

Shame. It comes up a *lot* in my project. A while ago we converted a task
that processes a queue of objects to processing groups of a thousand
objects, which sped up the process considerably. So we run an awful lot of
queries with IN lists with a thousand values. They hit the indexes, then
fetch the rows by random access. A full table sequential scan would take
much longer. It'd be awfully nice to have those queries go twelve times
faster.

The bitmap scan method does ordered reads of the table, which can 
partially take advantage of sequential reads. Not sure whether bitmap 
scan is optimal for your situation or whether your situation would allow 
this to be taken advantage of.

Normally queries fit mostly in either the large batch query domain or the
small quick oltp query domain. For the former Postgres tries quite hard to do
sequential i/o which the OS will do readahead for and you'll get good
performance. For the latter you're normally running many simultaneous such
queries and the raid array helps quite a bit.

Having twelve discs will certainly improve the sequential IO throughput!

However, if this was implemented (and I have *no* idea whatsoever how hard
it would be), then large index scans would scale with the number of discs
in the system, which would be quite a win, I would imagine. Large index
scans can't be that rare!

Do you know that there is a problem, or are you speculating about one? I 
think your case would be far more compelling if you could show a 
problem. :-)

I would think that at a minimum, having 12 disks with RAID 0 or RAID 1+0 
would allow your insane queries to run concurrent with up to 12 other 
queries. Unless your insane query is the only query in use on the 
system, I think you may be speculating about a nearly non-existence 
problem. Just a suggestion...

I recall talk of more intelligent table scanning algorithms, and the use 
of asynchronous I/O to benefit from RAID arrays, but the numbers 
prepared to convince people that the change would have effect have been 
less than impressive.

Cheers,
mark

--
Mark Mielke <mark@xxxxxxxxx>

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate