Re: SSD journals killed by VMs generating 500 IOPs (4kB) non-stop for a month, seemingly because of a syslog-ng bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/23/2015 10:42 AM, Eneko Lacunza wrote:
Hi Mart,

El 23/11/15 a las 10:29, Mart van Santen escribió:


On 11/22/2015 10:01 PM, Robert LeBlanc wrote:
There have been numerous on the mailing list of the Samsung EVO and
Pros failing far before their expected wear. This is most likely due
to the 'uncommon' workload of Ceph and the controllers of those drives
are not really designed to handle the continuous direct sync writes
that Ceph does. Because of this they can fail without warning
(controller failure rather than MLC failure).

I'm new to the mailinglist and I'm scanning the archive currently.  And I'm getting a sense of the Samsung Evo quality disks. If i understand correctly, is is at least advise to put DC grade Journals in front om them to safe them a bit from failure. For example intel 750's.
I don't think Intel 750's are DC grade. I don't have any of them though.

OK thx. Interesting,  as the 750's are more expansive here than some Intel S3xxx (per $)


However, is there experience in when the Evo's fail in the Ceph scenarion? For example, is wear leveling is according SMART about 40%, it's time to replace your disks? Or is it just random. Actually we are using mostly Crucial drives (m550, mx200's), there is not a lot about them on the list. Do other people use them and what's there experience so far. I expect about the same quality of the Samsung Evo's, but I'm not sure if that is the correct conclusion.
My experience with Samsung 840 pro is that they can't be used for Ceph at all. In case of Crucial M550, they are slow and have little endurance for ceph use, but I have used them and seemed reliable during warranty lifetime (we retired them for performance reasons).

They are indeed not the fasted in the world, but our cluster isn't that heavy used (max ~4000 IOPS for the whole cluster, ~50 1xTB disks currently). Disk IO is currently between 5% and 20% usage.



About SSD failure in general, do they normally fail hard, or are they just getting unbearable slow? We do measure/graph disks 'busy' performance, and use that as an indicator if a disk is getting slow. Is this is a sensible approach?

Just don't do it. Use DC SSDs, like intel S3xxx, or Samsung DC Pro, or something like that. You will save a lot of time and effort, and possibly also money.

It's clear for me the Intel SSDs are the best to go. However, I still want to estimate at what pace we should replace them in the future. So it still make sense for us the have some idea of to predict when it's time to speed up replacements.


Regards,

Mart van Santen


Cheers
Eneko
-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
      943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Mart van Santen
Greenhost
E: mart@xxxxxxxxxxxx
T: +31 20 4890444
W: https://greenhost.nl

A PGP signature can be attached to this e-mail,
you need PGP software to verify it. 
My public key is available in keyserver(s)
see: http://tinyurl.com/openpgp-manual

PGP Fingerprint: CA85 EB11 2B70 042D AF66  B29A 6437 01A1 10A3 D3A5

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux