Re: SMART disk monitoring

Piotr Dałek <piotr.dalek@xxxxxxxxxxxx> · Mon, 13 Nov 2017 12:53:16 +0100

On 17-11-12 09:16 PM, Sage Weil wrote:
On Sun, 12 Nov 2017, Lars Marowsky-Bree wrote:
On 2017-11-10T22:36:46, Yaarit Hatuka <yaarit@xxxxxxxxx> wrote:

Many thanks! I'm very excited to join Ceph's outstanding community!
I'm looking forward to working on this challenging project, and I'm
very grateful for the opportunity to be guided by Sage.

That's all excellent news!

Can we discuss though if/how this belongs into ceph-osd? Given that this
can (and is) already collected via smartmon, either via prometheus or, I
assume, collectd as well? Does this really need to be added to the OSD
code?

Would the goal be for them to report this to ceph-mgr, or expose
directly as something to be queried via, say, a prometheus exporter
binding? Or are the OSDs supposed to directly act on this information?

The OSD is just a convenient channel, but needn't be the only
one or only option.

Part 1 of the project is to get JSON output out of smartctl so we avoid
one of the many crufty projects floating around to parse its weird output;
that'll be helpful all consumers, presumably.

That means a new patch to smartctl itself, right?

Part 2 is to map OSDs to host:device pairs; that merged already.

Part 3 is to gather the actual data.  The prototype has the OSD polling
this because it (1) knows which devices it consumes and (2) is present on
every node.  We're contemplating a per-host ceph-volume-agent for
assisting with OSD (de)provisioning (i.e., running ceph-volume); that
could be an option.  Of if some other tool is already scraping it and can
be queried, that would work too.

I think the OSD will end up being a necessary path (perhaps among many),
though, because when we are using SPDK I don't think we'll be able to get
the SMART data via smartctl (or any other tool) at all because the OSD
process will be running the NVMe driver.

This may not work anyway, because  many controllers (including JBOD 
controllers) don't pass-through SMART data, or the data don't make sense.

Part 4 is to archive the results.  The original thought was to dump it
into RADOS.  I hadn't considered prometheus, but that might be a better
fit!  I'm generally pretty cautious about introducing dependencies like
this but we're already expecting prometheus to be used for other metrics
for the dashboard.  I'm not sure whether prometheus' query interface lends
itself to the failure models, though...
Part 5 is to do some basic failure prediction!

SMART is unreliable on spinning disks, and on SSDs it's only as reliable as 
firmware goes (and that is often questionable).
Also, many vendors give different meaning to different SMART attributes, 
making some of obvious choices (like power-on hours or power-cycle count) 
useless (see https://www.backblaze.com/blog/hard-drive-smart-stats/ for 
example).

Anyway, we'd love to see that this feature can be completely disabled by 
config change and don't incur any backwards incompatibility by itself.

--
Piotr Dałek
piotr.dalek@xxxxxxxxxxxx
https://www.ovh.com/us/
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html