I would also add that the journal activity is write intensive so a small part of the drive would get excessive writes if the journal and data are co-located on an SSD. This would also be the case where an SSD has multiple journals associated with many HDDs. -----Original Message----- From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Wido den Hollander Sent: Tuesday, December 22, 2015 11:46 AM To: ceph-users@xxxxxxxxxxxxxx Subject: Re: Intel S3710 400GB and Samsung PM863 480GB fio results On 12/22/2015 05:36 PM, Tyler Bishop wrote: > Write endurance is kinda bullshit. > > We have crucial 960gb drives storing data and we've only managed to take 2% off the drives life in the period of a year and hundreds of tb written weekly. > > > Stuff is way more durable than anyone gives it credit. > > No, that is absolutely not true. I've seen multiple SSDs fail in Ceph clusters. Small Samsung 850 Pro SSDs worn out within 4 months in heavy write-intensive Ceph clusters. > ----- Original Message ----- > From: "Lionel Bouton" <lionel+ceph@xxxxxxxxxxx> > To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>, "ceph-users" > <ceph-users@xxxxxxxxxxxxxx> > Sent: Tuesday, December 22, 2015 11:04:26 AM > Subject: Re: Intel S3710 400GB and Samsung PM863 480GB > fio results > > Le 22/12/2015 13:43, Andrei Mikhailovsky a écrit : >> Hello guys, >> >> Was wondering if anyone has done testing on Samsung PM863 120 GB version to see how it performs? IMHO the 480GB version seems like a waste for the journal as you only need to have a small disk size to fit 3-4 osd journals. Unless you get a far greater durability. > > The problem is endurance. If we use the 480GB for 3 OSDs each on the > cluster we might build we expect 3 years (with some margin for error > but not including any write amplification at the SSD level) before the > SSDs will fail. > In our context a 120GB model might not even last a year (endurance is > 1/4th of the 480GB model). This is why SM863 models will probably be > more suitable if you have access to them: you can use smaller ones > which cost less and get more endurance (you'll have to check the > performance though, usually smaller models have lower IOPS and bandwidth). > >> I am planning to replace my current journal ssds over the next month or so and would like to find out if there is an a good alternative to the Intel's 3700/3500 series. > > 3700 are a safe bet (the 100GB model is rated for ~1.8PBW). 3500 > models probably don't have enough endurance for many Ceph clusters to > be cost effective. The 120GB model is only rated for 70TBW and you > have to consider both client writes and rebalance events. > I'm uneasy with SSDs expected to fail within the life of the system > they are in: you can have a cascade effect where an SSD failure brings > down several OSDs triggering a rebalance which might make SSDs > installed at the same time fail too. In this case in the best scenario > you will reach your min_size (>=2) and block any writes which would > prevent more SSD failures until you move journals to fresh SSDs. If > min_size = 1 you might actually lose data. > > If you expect to replace your current journal SSDs if I were you I > would make a staggered deployment over several months/a year to avoid > them failing at the same time in case of an unforeseen problem. In > addition this would allow to evaluate the performance and behavior of > a new SSD model with your hardware (there have been reports of > performance problems with some combinations of RAID controllers and > SSD models/firmware versions) without impacting your cluster's overall > performance too much. > > When using SSDs for journals you have to monitor both : > * the SSD wear leveling or something equivalent (SMART data may not be > available if you use a RAID controller but usually you can get the > total amount data written) of each SSD, > * the client writes on the whole cluster. > And check periodically what the expected lifespan left there is for > each of your SSD based on their current state, average write speed, > estimated write amplification (both due to pool's size parameter and > the SSD model's inherent write amplification) and the amount of data > moved by rebalance events you expect to happen. > Ideally you should make this computation before choosing the SSD > models, but several variables are not always easy to predict and > probably will change during the life of your cluster. > > Lionel > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com