Hi, On 19.2.2014 03:45, KONDO Mitsumasa wrote: > (2014/02/19 5:41), Tomas Vondra wrote: >> On 18.2.2014 02:23, KONDO Mitsumasa wrote: >>> Hi, >>> >>> I don't have PERC H710 raid controller, but I think he would like to >>> know raid striping/chunk size or read/write cache ratio in >>> writeback-cache setting is the best. I'd like to know it, too:) >> >> The stripe size is actually a very good question. On spinning drives it >> usually does not matter too much - unless you have a very specialized >> workload, the 'medium size' is the right choice (AFAIK we're using 64kB >> on H710, which is the default). > > I am interested that raid stripe size of PERC H710 is 64kB. In HP > raid card, default chunk size is 256kB. If we use two disks with raid > 0, stripe size will be 512kB. I think that it might too big, but it > might be optimized in raid card... In actually, it isn't bad in that > settings. With HP controllers this depends on RAID level (and maybe even controller). Which HP controller are you talking about? I have some basic experience with P400/P800, and those have 16kB (RAID6), 64kB (RAID5) or 128kB (RAID10) defaults. None of them has 256kB. See http://bit.ly/1bN3gIs (P800) and http://bit.ly/MdsEKN (P400). > I'm interested in raid card internal behavior. Fortunately, linux raid > card driver is open souce, so we might good at looking the source code > when we have time. What do you mean by "linux raid card driver"? Afaik the admin tools may be available, but the interesting stuff happens inside the controller, and that's still proprietary. >> With SSDs this might actually matter much more, as the SSDs work with >> "erase blocks" (mostly 512kB), and I suspect using small stripe might >> result in repeated writes to the same block - overwriting one block >> repeatedly and thus increased wearout. But maybe the controller will >> handle that just fine, e.g. by coalescing the writes and sending them to >> the drive as a single write. Or maybe the drive can do that in local >> write cache (all SSDs have that). > > I have heard that genuine raid card with genuine ssds are optimized in > these ssds. It is important that using compatible with ssd for > performance. If the worst case, life time of ssd is be short, and will > be bad performance. Well, that's the main question here, right? Because if the "worst case" actually happens to be true, then what's the point of SSDs? You have a disk that does not provite the performance you expected, died much sooner than you expected and maybe suddenly so it interrupted the operation. So instead of paying more for higher performance, you paid more for bad performance and much shorter life of the disk. Coincidentally we're currently trying to find the answer to this question too. That is - how long will the SSD endure in that particular RAID level? Does that pay off? BTW what you mean by "genuine raid card" and "genuine ssds"? > I'm wondering about effective of readahead in OS and raid card. In > general, readahead data by raid card is stored in raid cache, and > not stored in OS caches. Readahead data by OS is stored in OS cache. > I'd like to use all raid cache for only write cache, because fsync() > becomes faster. But then, it cannot use readahead very much by raid > card.. If we hope to use more effectively, we have to clear it, but > it seems difficult:( I've done a lot of testing of this on H710 in 2012 (~18 months ago), measuring combinations of * read-ahead on controller (adaptive, enabled, disabled) * read-ahead in kernel (with various sizes) * scheduler The test was the simplest and most suitable workload for this - just "dd" with 1MB block size (AFAIK, would have to check the scripts). In short, my findings are that: * read-ahead in kernel matters - tweak this * read-ahead on controller sucks - either makes no difference, or actually harms performance (adaptive with small values set for kernel read-ahead) * scheduler made no difference (at least for this workload) So we disable readahead on the controller, use 24576 for kernel and it works fine. I've done the same test with fusionio iodrive (attached to PCIe, not through controller) - absolutely no difference. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance