Hello, On Mon, 29 Sep 2014 10:31:03 +0200 Emmanuel Lacour wrote: > > Dear ceph users, > > > we are managing ceph clusters since 1 year now. Our setup is typically > made of Supermicro servers with OSD sata drives and journal on SSD. > > Those SSD are all failing one after the other after one year :( > Given your SSDs, are they failing after more than 150TB have been written? > We used Samsung 850 pro (120Go) with two setup (small nodes with 2 ssd, > 2 HD in 1U): > > 1) raid 1 :( (bad idea, each SSD support all the OSDs journals writes :() > 2) raid 1 for OS (nearly no writes) and dedicated partition for journals > (one per OSD) > > > I'm convinced that the second setup is better and we migrate old setup > to this one. > Yes, the 2nd option is the better one for many reasons and I'm using that myself. > Thought, statistics gives 60GB (option 2) to 100 GB (option 1) writes > per day on SSD on a not really over loaded cluster. Samsung claims to > give 5 years warranty if under 40GB/day. Those numbers seems very low to > me. > This is confusing, as the Samsung homepage gives a 150TBW lifetime, and this would be about half of it. > What are your experiences on this? What write volumes do you encounter, > on wich SSD models, which setup and what MTBF? > If you read/search this ML it should be clear to you that the only SSDs that have the durability (and a good TBW/$ ratio when looking at it long term) are Intel DC 3700S. Monitor their wearout ratio and you're likely never have one fail on you unexpectedly. A 200 TB DC 3700S has a TBW of 1825, more than 10 times that of your Samsungs and would allow you to write 1TB each day for 5 years. Christian -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/