Re: Intel S3710 400GB and Samsung PM863 480GB fio results

Andrei Mikhailovsky <andrei@xxxxxxxxxx> · Sat, 26 Dec 2015 14:01:32 +0000 (GMT)

Yes, indeed. it seems not to matter much if you do nnot have a write intensive cluster.

We have Intel 520s which were in production for over 2 years and only used 5% of their life according to smart. I've also used Samsung 840Pro, which had the same/similar figures over a year usage. So, I guess for my purpose, the endurance is not such a big deal. However, the ssds that I have absolutely suck performance wise for the ceph journal. Especially the Samsung drives. That's the main reason for wanting the 3700/3500 or their equivalent.

Andrei

----- Original Message -----
> From: "Tyler Bishop" <tyler.bishop@xxxxxxxxxxxxxxxxx>
> To: "Lionel Bouton" <lionel+ceph@xxxxxxxxxxx>
> Cc: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Sent: Tuesday, 22 December, 2015 16:36:21
> Subject: Re:  Intel S3710 400GB and Samsung PM863 480GB fio results

> Write endurance is kinda bullshit.
> 
> We have crucial 960gb drives storing data and we've only managed to take 2% off
> the drives life in the period of a year and hundreds of tb written weekly.
> 
> 
> Stuff is way more durable than anyone gives it credit.
> 
> 
> ----- Original Message -----
> From: "Lionel Bouton" <lionel+ceph@xxxxxxxxxxx>
> To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>, "ceph-users"
> <ceph-users@xxxxxxxxxxxxxx>
> Sent: Tuesday, December 22, 2015 11:04:26 AM
> Subject: Re:  Intel S3710 400GB and Samsung PM863 480GB fio results
> 
> Le 22/12/2015 13:43, Andrei Mikhailovsky a écrit :
>> Hello guys,
>>
>> Was wondering if anyone has done testing on Samsung PM863 120 GB version to see
>> how it performs? IMHO the 480GB version seems like a waste for the journal as
>> you only need to have a small disk size to fit 3-4 osd journals. Unless you get
>> a far greater durability.
> 
> The problem is endurance. If we use the 480GB for 3 OSDs each on the
> cluster we might build we expect 3 years (with some margin for error but
> not including any write amplification at the SSD level) before the SSDs
> will fail.
> In our context a 120GB model might not even last a year (endurance is
> 1/4th of the 480GB model). This is why SM863 models will probably be
> more suitable if you have access to them: you can use smaller ones which
> cost less and get more endurance (you'll have to check the performance
> though, usually smaller models have lower IOPS and bandwidth).
> 
>> I am planning to replace my current journal ssds over the next month or so and
>> would like to find out if there is an a good alternative to the Intel's
>> 3700/3500 series.
> 
> 3700 are a safe bet (the 100GB model is rated for ~1.8PBW). 3500 models
> probably don't have enough endurance for many Ceph clusters to be cost
> effective. The 120GB model is only rated for 70TBW and you have to
> consider both client writes and rebalance events.
> I'm uneasy with SSDs expected to fail within the life of the system they
> are in: you can have a cascade effect where an SSD failure brings down
> several OSDs triggering a rebalance which might make SSDs installed at
> the same time fail too. In this case in the best scenario you will reach
> your min_size (>=2) and block any writes which would prevent more SSD
> failures until you move journals to fresh SSDs. If min_size = 1 you
> might actually lose data.
> 
> If you expect to replace your current journal SSDs if I were you I would
> make a staggered deployment over several months/a year to avoid them
> failing at the same time in case of an unforeseen problem. In addition
> this would allow to evaluate the performance and behavior of a new SSD
> model with your hardware (there have been reports of performance
> problems with some combinations of RAID controllers and SSD
> models/firmware versions) without impacting your cluster's overall
> performance too much.
> 
> When using SSDs for journals you have to monitor both :
> * the SSD wear leveling or something equivalent (SMART data may not be
> available if you use a RAID controller but usually you can get the total
> amount data written) of each SSD,
> * the client writes on the whole cluster.
> And check periodically what the expected lifespan left there is for each
> of your SSD based on their current state, average write speed, estimated
> write amplification (both due to pool's size parameter and the SSD
> model's inherent write amplification) and the amount of data moved by
> rebalance events you expect to happen.
> Ideally you should make this computation before choosing the SSD models,
> but several variables are not always easy to predict and probably will
> change during the life of your cluster.
> 
> Lionel
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com