Re: Hardware/OS recommendations for large databases

Ron <rjpeace@xxxxxxxxxxxxx> · Fri, 18 Nov 2005 15:29:11 -0500

Breaking the ~120MBps pg IO ceiling by any means 
is an important result.  Particularly when you 
get a ~2x improvement.  I'm curious how far we 
can get using simple approaches like this.

At 10:13 AM 11/18/2005, Luke Lonergan wrote:
Dave,

On 11/18/05 5:00 AM, "Dave Cramer" <pg@xxxxxxxxxxxxx> wrote:
>
> Now there's an interesting line drawn in the sand. I presume you have
> numbers to back this up ?
>
> This should draw some interesting posts.

Part 2: The answer

System A:
This system is running RedHat 3 Update 4, with a Fedora 2.6.10 Linux kernel.

On a single table with 15 columns (the Bizgres 
IVP) at a size double memory (2.12GB), Postgres 
8.0.3 with Bizgres enhancements takes 32 seconds 
to scan the table: that?s 66 MB/s.  Not the 
efficiency I?d hope from the onboard SATA 
controller that I?d like, I would have expected 
to get 85% of the 100MB/s raw read performance.
Have you tried the large read ahead trick with 
this system?  It would be interesting to see how 
much it would help.  It might even be worth it to 
do the experiment at all of [default, 2x default, 
4x default, 8x default, etc] read ahead until 
either a) you run out of resources to support the 
desired read ahead, or b) performance levels 
off.  I can imagine the results being very enlightening.

System B:
This system is running an XFS filesystem, and 
has been tuned to use very large (16MB) 
readahead.  It?s running the Centos 4.1 distro, 
which uses a Linux 2.6.9 kernel.

Same test as above, but with 17GB of data takes 
69.7 seconds to scan (!)  That?s 244.2MB/s, 
which is obviously double my earlier point of 
110-120MB/s.  This system is running with a 16MB 
Linux readahead setting, let?s try it with the 
default (I think) setting of 256KB ? AHA! Now we get 171.4 seconds or 99.3MB/s.
The above experiment would seem useful here as well.

Summary:

<cough, cough> OK ? you can get more I/O 
bandwidth out of the current I/O path for 
sequential scan if you tune the filesystem for 
large readahead.  This is a cheap alternative to 
overhauling the executor to use asynch I/O.

Still, there is a CPU limit here ? this is not 
I/O bound, it is CPU limited as evidenced by the 
sensitivity to readahead settings.   If the 
filesystem could do 1GB/s, you wouldn?t go any faster than 244MB/s.

- Luke

I respect your honesty in reporting results that 
were different then your expectations or 
previously taken stance.  Alan Stange's comment 
re: the use of direct IO along with your comments 
re: async IO and mem copies plus the results of 
these experiments could very well point us 
directly at how to most easily solve pg's CPU boundness during IO.

[HACKERS] are you watching this?

Ron

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq