I am unsure about the performance side but, ZFS is generally very attractive to me. Key advantages: 1) Checksumming and automatic fixing-of-broken-things on every file (not just postgres pages, but your scripts, O/S, program files). 2) Built-in lightweight compression (doesn't help with TOAST tables, in fact may slow them down, but helpful for other things). This may actually be a net negative for pg so maybe turn it off. 3) ZRAID mirroring or ZRAID5/6. If you have trouble persuading someone that it's safe to replace a RAID array with a single drive... you can use a couple of NVMe SSDs with ZFS mirror or zraid, and get the same availability you'd get from a RAID controller. Slightly better, arguably, since they claim to have fixed the raid write-hole problem. 4) filesystem snapshotting Despite the costs of checksumming etc., I suspect ZRAID running on a fast CPU with multiple NVMe drives will outperform quite a lot of the alternatives, with great data integrity guarantees. Haven't built one yet. Hope to, later this year. Steve, I would love to know more about how you're getting on with your NVMe disk in postgres! Graeme. On 07 Jul 2015, at 12:28, Mkrtchyan, Tigran <tigran.mkrtchyan@xxxxxxx> wrote: > Thanks for the Info. > > So if RAID controllers are not an option, what one should use to build > big databases? LVM with xfs? BtrFs? Zfs? > > Tigran. > > ----- Original Message ----- >> From: "Graeme B. Bell" <graeme.bell@xxxxxxxx> >> To: "Steve Crawford" <scrawford@xxxxxxxxxxxxxxxxxxxx> >> Cc: "Wes Vaske (wvaske)" <wvaske@xxxxxxxxxx>, "pgsql-performance" <pgsql-performance@xxxxxxxxxxxxxx> >> Sent: Tuesday, July 7, 2015 12:22:00 PM >> Subject: Re: New server: SSD/RAID recommendations? > >> Completely agree with Steve. >> >> 1. Intel NVMe looks like the best bet if you have modern enough hardware for >> NVMe. Otherwise e.g. S3700 mentioned elsewhere. >> >> 2. RAID controllers. >> >> We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines. >> This might give people idea about where the risk lies in the path from disk to >> CPU. >> >> We've had 2 RAID card failures in the last 12 months that nuked the array with >> days of downtime, and 2 problems with batteries suddenly becoming useless or >> suddenly reporting wildly varying temperatures/overheating. There may have been >> other RAID problems I don't know about. >> >> Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week (I >> guess they have 100-200 disks?). We also have about 25-30 Hitachi/HGST HDDs. >> >> So by my estimates: >> 30% annual problem rate with RAID controllers >> 30-50% failure rate with Seagate HDDs (backblaze saw similar results) >> 0% failure rate with HGST HDDs. >> 0% failure in our SSDs. (to be fair, our one samsung SSD apparently has a bug >> in TRIM under linux, which I'll need to investigate to see if we have been >> affected by). >> >> also, RAID controllers aren't free - not just the money but also the management >> of them (ever tried writing a complex install script that interacts work with >> MegaCLI? It can be done but it's not much fun.). Just take a look at the >> MegaCLI manual and ask yourself... is this even worth it (if you have a good >> MTBF on an enterprise SSD). >> >> RAID was meant to be about ensuring availability of data. I have trouble >> believing that these days.... >> >> Graeme Bell >> >> >> On 06 Jul 2015, at 18:56, Steve Crawford <scrawford@xxxxxxxxxxxxxxxxxxxx> wrote: >> >>> >>> 2. We don't typically have redundant electronic components in our servers. Sure, >>> we have dual power supplies and dual NICs (though generally to handle external >>> failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks and...no >>> backup RAID card. Intel Enterprise SSD already have power-fail protection so I >>> don't need a RAID card to give me BBU. Given the MTBF of good enterprise SSD >>> I'm left to wonder if placing a RAID card in front merely adds a new point of >>> failure and scheduled-downtime-inducing hands-on maintenance (I'm looking at >>> you, RAID backup battery). >> >> >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance