On Tue, Oct 20, 2015 at 10:14 AM, Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> wrote: > Hi, > > On 10/20/2015 03:30 PM, Merlin Moncure wrote: >> >> On Tue, Oct 20, 2015 at 3:14 AM, Birta Levente <blevi.linux@xxxxxxxxx> >> wrote: >>> >>> Hi >>> >>> I have a supermicro SYS-1028R-MCTR, LSI3108 integrated with SuperCap >>> module >>> (BTR-TFM8G-LSICVM02) >>> - 2x300GB 10k spin drive, as raid 1 (OS) >>> - 2x300GB 15k spin drive, as raid 1 (for xlog) >>> - 2x200GB Intel DC S3710 SSD (for DB), as raid 1 >>> >>> So how is better for the SSDs: mdraid or controller's raid? >> >> >> I personally always prefer mdraid if given a choice, especially when >> you have a dedicated boot drive. It's better in DR scenarios and for >> hardware migrations. Personally I find dedicated RAID controllers to >> be baroque. Flash SSDs (at least the good ones) are basically big >> RAID 0s with their own dedicated cache, supercap, and controller >> optimized to the underlying storage peculiarities. > > I don't know - I've always treated mdraid with a bit of suspicion as it does > not have any "global" write cache, which might be allowing failure modes > akin to the RAID5 write hole (similar issues exist for non-parity RAID > levels like RAID-1 or RAID-10). mdadm is pretty smart. it knows when its shutdown unclean and recalculates parity as needed. There are some theoretical edge case failure scenarios, but they are well understood. This is md's main advantage really, it's transparency and the huge body of lore around it. I have tiny data recovery side business (cost 0$, invitation only) of DR on NAS systems that in some cases commercial DR companies said were irrecoverable. By simply googling and following guides I was able to come up with the data, or at least most of it, every time. Good luck with that on proprietary RAID systems. In fact, there is no reason to believe that proprietary systems cover the write hole even if they have a centralized cache. They may claim it does and in fact do so 99 times out of 100 but how do you know it's really covered? Basically, you don't. I kind of trust Intel (now, it's been a journey), but I don't have a lot of confidence in certain enterprise gear vendors. On Tue, Oct 20, 2015 at 9:33 AM, Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote: > We're running LSI MegaRAIDs at work with 10 SSD RAID-5 arrays, and we > can get ~5k to 7k tps on a -s 10000 pgbench with the write cache on. > > When we turn the write cache off, we get 15k to 20k tps. This is on a > 120GB pgbench db that fits in memory, so it's all writes. This is my findings exactly. I'll double down on my statement; caching raid controllers are essentially obsolete technology. They are designed to solve a problem that simply doesn't exist any more because of SSDs. Unless your database is very, very, busy it's pretty hard to saturate a single low-mid tier SSD with zero engineering effort. It's time to let go: spinning drives are obsolete in the database world, at least in any scenario where you're measuring IOPS. merlin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general