Re: Reliability models

Kyle Bader <kyle@xxxxxxxxxxx> · Mon, 13 Jan 2014 09:25:14 -0800

Hi Loic,

> IIRC you figured out how to use https://github.com/ceph/ceph-tools/tree/master/models . Do you happen to have some kind of HOWTO or even the list of commands you've used to get me started ?

git clone ...ceph-tools
cd ceph-tools/models/reliability
./main.py -g

At this point a graphical form should pop up and most of it should be
straight forward. NRE rate is non recoverable read errors, the NRE
model determines how an array/disk responds to a non recoverable read
error event. I tend to be on the conservative end having seen my share
of RAID horror shows so I set the NRE model to "fail". The other
gotcha is most disk manufacturers use MTBF or AFR while the
reliability modeling tool requests FITS. The formula you will need is:

MTBF = 1,000,000,000 x 1/FIT. [1]

Stripe length is the number of RADOS objects that are require to store
a blob of data, ie all RADOS objects that compose a RBD volume. Since
this code was written a while ago there is no code to support modeling
erasure coded objects. If you have any questions, let me know!

[1] http://en.wikipedia.org/wiki/Failure_rate#Units

-- 
Kyle Bader - Inktank
Senior Solution Architect
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html