On Apr 3, 2007, at 6:54 PM, Geoff Tolley wrote:
I don't think the density difference will be quite as high as you seem to think: most 320GB SATA drives are going to be 3-4 platters, the most that a 73GB SCSI is going to have is 2, and more likely 1, which would make the SCSIs more like 50% the density of the SATAs. Note that this only really makes a difference to theoretical sequential speeds; if the seeks are random the SCSI drives could easily get there 50% faster (lower rotational latency and they certainly will have better actuators for the heads). Individual 15K SCSIs will trounce 7.2K SATAs in terms of i/os per second.
Good point. On another note, I am wondering why nobody's brought up the command-queuing perf benefits (yet). Is this because sata vs scsi are at par here? I'm finding conflicting information on this -- some calling sata's ncq mostly crap, others stating the real-world results are negligible. I'm inclined to believe SCSI's pretty far ahead here but am having trouble finding recent articles on this.
What I always do when examining hard drive options is to see if they've been tested (or a similar model has) at http:// www.storagereview.com/ - they have a great database there with lots of low-level information (although it seems to be down at the time of writing).
Still down! They might want to get better drives... j/k.
But what's likely to make the largest difference in the OP's case (many inserts) is write caching, and a battery-backed cache would be needed for this. This will help mask write latency differences between the two options, and so benefit SATA more. Some 3ware cards offer it, some don't, so check the model.
The servers are hooked up to a reliable UPS. The battery-backed cache won't hurt but might be overkill (?).
How the drives are arranged is going to be important too - one big RAID 10 is going to be rather worse than having arrays dedicated to each of pg_xlog, indices and tables, and on that front the SATA option is going to grant more flexibility.
I've read some recent contrary advice. Specifically advising the sharing of all files (pg_xlogs, indices, etc..) on a huge raid array and letting the drives load balance by brute force. I know the postgresql documentation claims up to 13% more perf for moving the pg_xlog to its own device(s) -- but by sharing everything on a huge array you lose a small amount of perf (when compared to the theoretically optimal solution) - vs being significantly off optimal perf if you partition your tables/files wrongly. I'm willing to do reasonable benchmarking but time is money -- and reconfiguring huge arrays in multiple configurations to get possibly get incremental perf might not be as cost efficient as just spending more on hardware.
Thanks for all the tips.