Re: which SSD / experiences with Samsung 843T vs. Intel s3700

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



No, they are dead dead dead. Can't get anything off of them. If you look back further on this thread I think the most noteworthy part of this whole experience is just how far off my write estimates were. The ones that have not died have somewhere between 24 and 32 TB written to them after 9 months in service. This is almost 4x what I thought they would get.

QH 

On Fri, Sep 18, 2015 at 1:48 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
"850 PRO" is a workstation drive. You shouldn't put it in the server...
But it should not just die either way, so don't tell them you use it for Ceph next time.

Do the drives work when replugged? Can you get anything from SMART?

Jan


On 18 Sep 2015, at 02:57, James (Fei) Liu-SSI <james.liu@xxxxxxxxxxxxxxx> wrote:

Hi Quentin,
Samsung has so different type of SSD for different type of workload with different SSD media like SLC,MLC,TLC ,3D NAND etc. They were designed for different workloads for different purposes. Thanks for your understanding and support.
 
Regards,
James
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Quentin Hartman
Sent: Thursday, September 17, 2015 4:05 PM
To: Andrija Panic
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: which SSD / experiences with Samsung 843T vs. Intel s3700
 
I ended up having 7 total die. 5 while in service, 2 more when I hooked them up to a test machine to collect information from them. To Samsung's credit, they've been great to deal with and are replacing the failed drives, on the condition that I don't use them for ceph again. Apparently they sent some of my failed drives to an engineer in Korea and they did a failure analysis on them and came to the conclusion they we put to an "unintended use". I have seven left I'm not sure what to do with.
 
I've honestly always really liked Samsung, and I'm disappointed that I wasn't able to find anyone with their DC-class drives actually in stock so I ended up switching the to Intel S3700s. My users will be happy to have some SSDs to put in their workstations though!
 
QH
 
On Thu, Sep 17, 2015 at 4:49 PM, Andrija Panic <andrija.panic@xxxxxxxxx> wrote:
Another one bites the dust...
 
This is Samsung 850 PRO 256GB... (6 journals on this SSDs just died...)
 
[root@cs23 ~]# smartctl -a /dev/sda
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.10.66-1.el6.elrepo.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
 
Vendor:               /1:0:0:0
Product:
User Capacity:        600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options
 
On 8 September 2015 at 18:01, Quentin Hartman <qhartman@xxxxxxxxxxxxxxxxxxx> wrote:
On Tue, Sep 8, 2015 at 9:05 AM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
A list of hardware that is known to work well would be incredibly
valuable to people getting started. It doesn't have to be exhaustive,
nor does it have to provide all the guidance someone could want. A
simple "these things have worked for others" would be sufficient. If
nothing else, it will help people justify more expensive gear when their
approval people say "X seems just as good and is cheaper, why can't we
get that?".


So I have my opinions on different drives, but I think we do need to be really careful not to appear to endorse or pick on specific vendors. The more we can stick to high-level statements like:

- Drives should have high write endurance
- Drives should perform well with O_DSYNC writes
- Drives should support power loss protection for data in motion

The better I think.  Once those are established, I think it's reasonable to point out that certain drives meet (or do not meet) those criteria and get feedback from the community as to whether or not vendor's marketing actually reflects reality.  It'd also be really nice to see more information available like the actual hardware (capacitors, flash cells, etc) used in the drives.  I've had to show photos of the innards of specific drives to vendors to get them to give me accurate information regarding certain drive capabilities.  Having a database of such things available to the community would be really helpful.

 
That's probably a very good approach. I think it would be pretty simple to avoid the appearance of endorsement if the data is presented correctly.
 

To that point, I think perhaps though something more important than a
list of known "good" hardware would be a list of known "bad" hardware,

I'm rather hesitant to do this unless it's been specifically confirmed by the vendor.  It's too easy to point fingers (see the recent kernel trim bug situation).
 
I disagree. I think that only comes into play if you claim to know why the hardware has problems. In this case, if you simply state "people who have used this drive have experienced a large number of seemingly premature failures when using them as journals" that provides sufficient warning to users, and if the vendor wants to engage the community and potentially pin down why and help us find a way to make the device work or confirm that it's just not suited, then that's on them. Samsung seems to be doing exactly that. It would be great to have them help provide that level of detail, but again, I don't think it's necessary. We're not saying "ceph/redhat/$whatever says this hardware sucks" we're saying "The community has found that using this hardware with ceph has exhibited these negative behaviors...". At that point you're just relaying experiences and collecting them in a central location. It's up to the reader to draw conclusions from it.
 
But again, I think more important than either of these would be a collection of use cases with actual journal write volumes that have occurred in those use cases so that people can make more informed purchasing decisions. The fact that my small openstack cluster created 3.6T of writes per month on my journal drives (3 OSD each) is somewhat mind-blowing. That's almost four times the amount of writes my best guess estimates indicated we'd be doing. Clearly there's more going on than we are used to paying attention to. Someone coming to ceph and seeing the cost of DC-class SSDs versus consumer-class SSDs will almost certainly suffer from some amount of sticker shock, and even if they don't their purchasing approval people almost certainly will. This is especially true for people in smaller organizations where SSDs are still somewhat exotic. And when they come back with the "Why won't cheaper thing X be OK?" they need to have sufficient information to answer that. Without a test environment to generate data with, they will need to rely on the experiences of others, and right now those experiences don't seem to be documented anywhere, and if they are, they are not very discoverable.
 
QH
 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 
-- 
 
Andrija Panić
 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux