Re: Hardware/OS recommendations for large databases (

Dave Cramer <pg@xxxxxxxxxxxxx> · Fri, 18 Nov 2005 10:25:52 -0500

Luke,
Interesting numbers. I'm a little concerned about the use of blockdev —setra 16384. If I understand this correctly it assumes that the table is contiguous on the disk does it not ?

Dave
On 18-Nov-05, at 10:13 AM, Luke Lonergan wrote:

 Dave,

 On 11/18/05 5:00 AM, "Dave Cramer" <pg@xxxxxxxxxxxxx> wrote:
 > 
 > Now there's an interesting line drawn in the sand. I presume you have 
 > numbers to back this up ?
 > 
 > This should draw some interesting posts.

 Part 2: The answer

 System A:
 This system is running RedHat 3 Update 4, with a Fedora 2.6.10 Linux kernel.

 On a single table with 15 columns (the Bizgres IVP) at a size double memory (2.12GB), Postgres 8.0.3 with Bizgres enhancements takes 32 seconds to scan the table: that’s 66 MB/s.  Not the efficiency I’d hope from the onboard SATA controller that I’d like, I would have expected to get 85% of the 100MB/s raw read performance.

 So that’s $1,200 / 66 MB/s (without adjusting for 2003 price versus now) = 18.2 $/MB/s

 Raw data:
 [llonergan@kite4 IVP]$ cat scan.sh 
 #!/bin/bash

 time psql -c "select count(*) from ivp.bigtable1" dgtestdb
 [llonergan@kite4 IVP]$ cat sysout1
   count   
 ----------
  10000000
 (1 row)

 real    0m32.565s
 user    0m0.002s
 sys     0m0.003s

 Size of the table data:
 [llonergan@kite4 IVP]$ du -sk dgtestdb/base
 2121648 dgtestdb/base

System B:
 This system is running an XFS filesystem, and has been tuned to use very large (16MB) readahead.  It’s running the Centos 4.1 distro, which uses a Linux 2.6.9 kernel.

 Same test as above, but with 17GB of data takes 69.7 seconds to scan (!)  That’s 244.2MB/s, which is obviously double my earlier point of 110-120MB/s.  This system is running with a 16MB Linux readahead setting, let’s try it with the default (I think) setting of 256KB – AHA! Now we get 171.4 seconds or 99.3MB/s.

 So, using the tuned setting of “blockdev —setra 16384” we get $6,000 / 244MB/s = 24.6 $/MB/s
 If we use the default Linux setting it’s 2.5x worse.

 Raw data:
 [llonergan@modena2 IVP]$ cat scan.sh 
 #!/bin/bash

 time psql -c "select count(*) from ivp.bigtable1" dgtestdb
 [llonergan@modena2 IVP]$ cat sysout3
   count   
 ----------
  80000000
 (1 row)

 real    1m9.875s
 user    0m0.000s
 sys     0m0.004s
 [llonergan@modena2 IVP]$ !du
 du -sk dgtestdb/base
 17021260        dgtestdb/base

 Summary:

 <cough, cough> OK – you can get more I/O bandwidth out of the current I/O path for sequential scan if you tune the filesystem for large readahead.  This is a cheap alternative to overhauling the executor to use asynch I/O.

 Still, there is a CPU limit here – this is not I/O bound, it is CPU limited as evidenced by the sensitivity to readahead settings.   If the filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s.

 - Luke