Hi Andrija, Thanks for your promptly response. Would be possible to have any change to know your hardware configuration including your server
information? Secondly, Is there anyway to duplicate your workload with fio-rbd, rbd bench or rados bench? “so 2 SSDs in 3 servers vanished in...2-3 weeks, after a 3-4 months of being in production (VMs/KVM/CloudStack)” What you mean over here is that you deploy Ceph with CloudStack , am I correct? The 2 SSDs vanished in 2~3 weeks is brand new Samsung 850 Pro 128GB, right? Thanks, James
From: Andrija Panic [mailto:andrija.panic@xxxxxxxxx]
Hi James, I had 3 CEPH nodes as folowing: 12 OSDs(HDD) and 2 SSDs (2x 6 Journals partitions on each SSD) - SSDs just vanished with no warning, no smartctl errors nothing... so 2 SSDs in 3 servers vanished in...2-3 weeks, after a 3-4 months of being
in production (VMs/KVM/CloudStack) Best, Andrija On 4 September 2015 at 19:27, James (Fei) Liu-SSI <james.liu@xxxxxxxxxxxxxxx> wrote: Hi Quentin and Andrija,
Thanks so much for reporting the problems with Samsung.
Would be possible to get to know your configuration of your system? What kind of workload are you running? Do you use Samsung SSD as separate journaling disk, right?
Thanks so much.
James From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Quentin Hartman Yeah, we've ordered some S3700's to replace them already. Should be here early next week. Hopefully they arrive before we have multiple nodes die at once and can no longer rebalance
successfully. Most of the drives I have are the 850 Pro 128GB (specifically MZ7KE128HMGA) There are a couple 120GB 850 EVOs in there too, but ironically, none of them have pooped out yet. On Thu, Sep 3, 2015 at 1:58 PM, Andrija Panic <andrija.panic@xxxxxxxxx> wrote: I really advise removing the bastards becore they die...no rebalancing hapening just temp osd down while replacing journals... What size and model are yours Samsungs? On Sep 3, 2015 7:10 PM, "Quentin Hartman" <qhartman@xxxxxxxxxxxxxxxxxxx> wrote: We also just started having our 850 Pros die one after the other after about 9 months of service. 3 down, 11 to go... No warning at all, the drive is fine, and then it's not even
visible to the machine. According to the stats in hdparm and the calcs I did they should have had years of life left, so it seems that ceph journals definitely do something they do not like, which is not reflected in their stats. QH On Wed, Aug 26, 2015 at 7:15 AM, 10 minus <t10tennn@xxxxxxxxx> wrote: Hi , We got a good deal on 843T and we are using it in our Openstack setup ..as journals .
When we compared with Intel SSDs I think it was 3700 they were shade slower for our workload and considerably cheaper. We did not run any synthetic benchmark since we had a specific use case. The performance was better than our old setup so it was good enough. hth On Tue, Aug 25, 2015 at 12:07 PM, Andrija Panic <andrija.panic@xxxxxxxxx> wrote: We have some 850 pro 256gb ssds if anyone interested to buy:) And also there was new 850 pro firmware that broke peoples disk which was revoked later etc... I'm sticking with only vacuum cleaners from Samsung for now, maybe... :) On Aug 25, 2015 12:02 PM, "Voloshanenko Igor" <igor.voloshanenko@xxxxxxxxx> wrote: To be honest, Samsung 850 PRO not 24/7 series... it's something about desktop+ series, but anyway - results from this drives - very very bad in any scenario acceptable by real life... Possible 845 PRO more better, but we don't want to experiment anymore... So we choose S3500 240G. Yes, it's cheaper than S3700 (about 2x times), and no so durable for writes, but
we think more better to replace 1 ssd per 1 year than to pay double price now. 2015-08-25 12:59 GMT+03:00 Andrija Panic <andrija.panic@xxxxxxxxx>: And should I mention that in another CEPH installation we had samsung 850 pro 128GB and all of 6 ssds died in 2 month period - simply disappear from the system, so not wear out... Never again we buy Samsung :) On Aug 25, 2015 11:57 AM, "Andrija Panic" <andrija.panic@xxxxxxxxx> wrote: First read please: We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those are constant performance numbers, meaning avoiding drives cache and running for longer period of time... We observed original issue by having high speed at begining of i.e. file transfer inside VM, which than halts to zero... We moved journals back to HDDs and performans was acceptable...no we are upgrading to intel S3500... Best any details on that ?
-- Andrija Panić |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com