Another work done (probably against similar dataset from BackBlaze) by IBM, which is pretty impressive: https://www.ibm.com/blogs/research/2016/08/predicting-disk-failures-reliable-clouds/ On Tue, Nov 14, 2017 at 10:19 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: > On Tue, 14 Nov 2017, Piotr Dałek wrote: >> On 17-11-14 05:09 AM, Ric Wheeler wrote: >> > On 11/13/2017 05:23 PM, Piotr Dałek wrote: >> > > This may not work anyway, because many controllers (including JBOD >> > > controllers) don't pass-through SMART data, or the data don't make sense. >> > >> > You are right that many controllers don't pass this information without >> > going through their non-open source tools. The libstoragemgmt project - >> > https://github.com/libstorage/libstoragemgmt - has added support for doing >> > some types of access for the physical back end drives. It is worth syncing >> > up with them I think to see how we might be able to extract interesting >> > bits. >> >> There's another problem - bcache/flashcache/<insert your favorite vendor> >> cache - osds often reside on top of some cache device, and accessing SMART >> values for that might not work, or might not return all required values. > > For devicemapper devices at least it is pretty straightforward to work out > the underlying physical device. > > I'm sure there will always be some devices and stacks that successfully > obscure the reliablity data, but most deployments will benefit. > >> > There is also a lot of information about drive failures (SSD and spinning) >> > at USENIX FAST over many years. Things have improved a lot over the years, >> > especially with modern SSD's and NVME where a lot of hard work has happened >> > to add improved metrics to the data. >> >> That's my point. That's a lot of statistics to chew through and most of it >> relies on assumptions that can be already wrong or be wrong some time after. >> All it takes is a brand-new product line with different characteristics. >> SSDs are different - you just measure number of erase/program cycles and >> (again) do assumptions based on that - that's easier and more reliable. >> Still, I would be *very* unhappy if I'd be woken up in the middle of the night >> just to realize that cluster incorrectly predicted disk failure and my company >> (and I'm pretty sure not only my company) wouldn't be happy either if cluster >> would force it to throw away perfectly good disks, because reusing them would >> yield the same result. >> On the other hand, this creates a back door for vendors to force device >> replacement even if it's perfectly fine, some SSD vendors already do this with >> their devices going into read-only mode even when there's a whole lot of p/e >> cycles left in flash cells. I don't think we need Ceph to go this way. > > OT: I view building good prediction models as an orthogonal problem, and > one that relies on collecting a large data set. Patrick McGarry and > several others are working on a related project to build a public data set > of SMART etc reliability data so that such models can be built for use in > open systems. Current data sets from backblaze suffer from a small set of > device models, which means only large cloud providers or system vendor > with large deployments are able to gather enough healthy metrics and > failure data to build good models. The goal of the other project is to > allow regular users (of systmes like Ceph) to opt into sharing reliability > data so that better models can be built--ones that cover a broader range > of devices. > >> tl;dr - I'm fine with that feature as long as there'll be a possibility to >> disable it entirely. > > Of course! > > sage -- Regards Huang Zhiteng -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html