On 12/22/2015 05:36 PM, Tyler Bishop wrote: > Write endurance is kinda bullshit. > > We have crucial 960gb drives storing data and we've only managed to take 2% off the drives life in the period of a year and hundreds of tb written weekly. > > > Stuff is way more durable than anyone gives it credit. > > No, that is absolutely not true. I've seen multiple SSDs fail in Ceph clusters. Small Samsung 850 Pro SSDs worn out within 4 months in heavy write-intensive Ceph clusters. > ----- Original Message ----- > From: "Lionel Bouton" <lionel+ceph@xxxxxxxxxxx> > To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Sent: Tuesday, December 22, 2015 11:04:26 AM > Subject: Re: Intel S3710 400GB and Samsung PM863 480GB fio results > > Le 22/12/2015 13:43, Andrei Mikhailovsky a écrit : >> Hello guys, >> >> Was wondering if anyone has done testing on Samsung PM863 120 GB version to see how it performs? IMHO the 480GB version seems like a waste for the journal as you only need to have a small disk size to fit 3-4 osd journals. Unless you get a far greater durability. > > The problem is endurance. If we use the 480GB for 3 OSDs each on the > cluster we might build we expect 3 years (with some margin for error but > not including any write amplification at the SSD level) before the SSDs > will fail. > In our context a 120GB model might not even last a year (endurance is > 1/4th of the 480GB model). This is why SM863 models will probably be > more suitable if you have access to them: you can use smaller ones which > cost less and get more endurance (you'll have to check the performance > though, usually smaller models have lower IOPS and bandwidth). > >> I am planning to replace my current journal ssds over the next month or so and would like to find out if there is an a good alternative to the Intel's 3700/3500 series. > > 3700 are a safe bet (the 100GB model is rated for ~1.8PBW). 3500 models > probably don't have enough endurance for many Ceph clusters to be cost > effective. The 120GB model is only rated for 70TBW and you have to > consider both client writes and rebalance events. > I'm uneasy with SSDs expected to fail within the life of the system they > are in: you can have a cascade effect where an SSD failure brings down > several OSDs triggering a rebalance which might make SSDs installed at > the same time fail too. In this case in the best scenario you will reach > your min_size (>=2) and block any writes which would prevent more SSD > failures until you move journals to fresh SSDs. If min_size = 1 you > might actually lose data. > > If you expect to replace your current journal SSDs if I were you I would > make a staggered deployment over several months/a year to avoid them > failing at the same time in case of an unforeseen problem. In addition > this would allow to evaluate the performance and behavior of a new SSD > model with your hardware (there have been reports of performance > problems with some combinations of RAID controllers and SSD > models/firmware versions) without impacting your cluster's overall > performance too much. > > When using SSDs for journals you have to monitor both : > * the SSD wear leveling or something equivalent (SMART data may not be > available if you use a RAID controller but usually you can get the total > amount data written) of each SSD, > * the client writes on the whole cluster. > And check periodically what the expected lifespan left there is for each > of your SSD based on their current state, average write speed, estimated > write amplification (both due to pool's size parameter and the SSD > model's inherent write amplification) and the amount of data moved by > rebalance events you expect to happen. > Ideally you should make this computation before choosing the SSD models, > but several variables are not always easy to predict and probably will > change during the life of your cluster. > > Lionel > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com