SSD MTBF

chibi@xxxxxxx (Christian Balzer) · Mon, 29 Sep 2014 17:57:12 +0900



Hello,

On Mon, 29 Sep 2014 10:31:03 +0200 Emmanuel Lacour wrote:

> 
> Dear ceph users,
> 
> 
> we are managing ceph clusters since 1 year now. Our setup is typically
> made of Supermicro servers with OSD sata drives and journal on SSD.
> 
> Those SSD are all failing one after the other after one year :(
> 
Given your SSDs, are they failing after more than 150TB have been written?

> We used Samsung 850 pro (120Go) with two setup (small nodes with 2 ssd,
> 2 HD in 1U):
> 
> 1) raid 1 :( (bad idea, each SSD support all the OSDs journals writes :()
> 2) raid 1 for OS (nearly no writes) and dedicated partition for journals
>   (one per OSD)
> 
> 
> I'm convinced that the second setup is better and we migrate old setup
> to this one.
> 
Yes, the 2nd option is the better one for many reasons and I'm using that
myself. 

> Thought, statistics gives 60GB (option 2) to 100 GB (option 1) writes
> per day on SSD on a not really over loaded cluster. Samsung claims to
> give 5 years warranty if under 40GB/day. Those numbers seems very low to
> me.
> 
This is confusing, as the Samsung homepage gives a 150TBW lifetime, and
this would be about half of it. 

> What are your experiences on this? What write volumes do you encounter,
> on wich SSD models, which setup and what MTBF?
> 
If you read/search this ML it should be clear to you that the only SSDs
that have the durability (and a good TBW/$ ratio when looking at it long
term) are Intel DC 3700S. 
Monitor their wearout ratio and you're likely never have one fail on you
unexpectedly.
A 200 TB DC 3700S has a TBW of 1825, more than 10 times that of your
Samsungs and would allow you to write 1TB each day for 5 years.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/