On Thu, 27 Dec 2007, Shane Ambler wrote:
So in theory a modern RAID 1 setup can be configured to get similar read
speeds as RAID 0 but would still drop to single disk speeds (or similar) when
writing, but RAID 0 can get the faster write performance.
The trick is, you need a perfect controller that scatters individual reads
evenly across the two disks as sequential reads move along the disk to
pull this off, bouncing between a RAID 1 pair to use all the bandwidth
available. There are caches inside the disk, read-ahead strategies as
well, and that all has to line up just right for a single client to get
all the bandwidth. Real-world disks and controllers don't quite behave
well enough for that to predictably deliver what you might expect from
theory. With RAID 0, getting the full read speed of 2Xsingle drive is
much more likely to actually happen than in RAID 1.
So in a perfect setup (probably 1+0) 4x 300MB/s SATA drives could
deliver 1200MB/s of data to RAM, which is also assuming that all 4
channels have their own data path to RAM and aren't sharing. (anyone
know how segregated the on board controllers such as these are?) (do
some pci controllers offer better throughput?)
OK, first off, beyond the occasional trivial burst you'll be hard pressed
to ever sustain over 60MB/s out of any single SATA drive. So the
theoretical max 4-channel speed is closer to 240MB/s.
A regular PCI bus tops out at a theoretical 133MB/s, and you sure can
saturate one with 4 disks and a good controller. This is why server
configurations have controller cards that use PCI-X (1024MB/s) or lately
PCI-e aka PCI/Express (250MB/s for each channel with up to 16 being
common). If your SATA cards are on a motherboard, that's probably using
some integrated controller via the Southbridge AKA the ICH. That's
probably got 250MB/s or more and in current products can easily outrun
most sets of disks you'll ever connect. Even on motherboards that support
8 SATA channels it will be difficult for anything else on the system to go
higher than 250MB/s even if the drives could potentially do more, and once
you're dealing with real-world workloads.
If you have multiple SATA controllers each with their own set of disk,
then you're back to having to worry about the bus limits. So, yes, there
are bus throughput considerations here, but unless you're building a giant
array or using some older bus technology you're unlikely to hit them with
spinning SATA disks.
We all know that doesn't happen in the real world ;-) Let's say we are
restricted to 80% - 1000MB/s
Yeah, as mentioned above it's actually closer to 20%.
While your numbers are off by a bunch, the reality for database use means
these computations don't matter much anyway. The seek related behavior
drives a lot of this more than sequential throughput, and decisions like
whether to split out the OS or WAL or whatever need to factor all that,
rather than just the theoretical I/O.
For example, one reason it's popular to split the WAL onto another disk is
that under normal operation the disk never does a seek. So if there's a
dedicated disk for that, the disk just writes but never moves much.
Where if the WAL is shared, the disk has to jump between writing that data
and whatever else is going on, and peak possible WAL throughput is waaaay
slower because of those seeks. (Note that unless you have a bunch of
disks, your WAL is unlikely to be a limiter anyway so you still may not
want to make it separate).
(This topic so badly needs a PostgreSQL specific FAQ)
--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match