Re: Intel S3710 400GB and Samsung PM863 480GB fio results

Wido den Hollander <wido@xxxxxxxx> · Tue, 22 Dec 2015 17:46:07 +0100

On 12/22/2015 05:36 PM, Tyler Bishop wrote:
> Write endurance is kinda bullshit.
> 
> We have crucial 960gb drives storing data and we've only managed to take 2% off the drives life in the period of a year and hundreds of tb written weekly.
> 
> 
> Stuff is way more durable than anyone gives it credit.
> 
> 

No, that is absolutely not true. I've seen multiple SSDs fail in Ceph
clusters. Small Samsung 850 Pro SSDs worn out within 4 months in heavy
write-intensive Ceph clusters.

> ----- Original Message -----
> From: "Lionel Bouton" <lionel+ceph@xxxxxxxxxxx>
> To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Sent: Tuesday, December 22, 2015 11:04:26 AM
> Subject: Re:  Intel S3710 400GB and Samsung PM863 480GB fio results
> 
> Le 22/12/2015 13:43, Andrei Mikhailovsky a écrit :
>> Hello guys,
>>
>> Was wondering if anyone has done testing on Samsung PM863 120 GB version to see how it performs? IMHO the 480GB version seems like a waste for the journal as you only need to have a small disk size to fit 3-4 osd journals. Unless you get a far greater durability.
> 
> The problem is endurance. If we use the 480GB for 3 OSDs each on the
> cluster we might build we expect 3 years (with some margin for error but
> not including any write amplification at the SSD level) before the SSDs
> will fail.
> In our context a 120GB model might not even last a year (endurance is
> 1/4th of the 480GB model). This is why SM863 models will probably be
> more suitable if you have access to them: you can use smaller ones which
> cost less and get more endurance (you'll have to check the performance
> though, usually smaller models have lower IOPS and bandwidth).
> 
>> I am planning to replace my current journal ssds over the next month or so and would like to find out if there is an a good alternative to the Intel's 3700/3500 series. 
> 
> 3700 are a safe bet (the 100GB model is rated for ~1.8PBW). 3500 models
> probably don't have enough endurance for many Ceph clusters to be cost
> effective. The 120GB model is only rated for 70TBW and you have to
> consider both client writes and rebalance events.
> I'm uneasy with SSDs expected to fail within the life of the system they
> are in: you can have a cascade effect where an SSD failure brings down
> several OSDs triggering a rebalance which might make SSDs installed at
> the same time fail too. In this case in the best scenario you will reach
> your min_size (>=2) and block any writes which would prevent more SSD
> failures until you move journals to fresh SSDs. If min_size = 1 you
> might actually lose data.
> 
> If you expect to replace your current journal SSDs if I were you I would
> make a staggered deployment over several months/a year to avoid them
> failing at the same time in case of an unforeseen problem. In addition
> this would allow to evaluate the performance and behavior of a new SSD
> model with your hardware (there have been reports of performance
> problems with some combinations of RAID controllers and SSD
> models/firmware versions) without impacting your cluster's overall
> performance too much.
> 
> When using SSDs for journals you have to monitor both :
> * the SSD wear leveling or something equivalent (SMART data may not be
> available if you use a RAID controller but usually you can get the total
> amount data written) of each SSD,
> * the client writes on the whole cluster.
> And check periodically what the expected lifespan left there is for each
> of your SSD based on their current state, average write speed, estimated
> write amplification (both due to pool's size parameter and the SSD
> model's inherent write amplification) and the amount of data moved by
> rebalance events you expect to happen.
> Ideally you should make this computation before choosing the SSD models,
> but several variables are not always easy to predict and probably will
> change during the life of your cluster.
> 
> Lionel
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com