Reliability model for RADOS - effects during second failures

Koleos Fuscus <koleosfuscus@xxxxxxxxx> · Thu, 3 Jul 2014 00:33:53 +0200

Hi Kyle, Loic,

The current code uses a “FIT rate multiplier” to include for instance
the effect of operations done in parallel. That multiplier (n) has an
effect on Pfail. In the initial failure, it is calculated using the
number of replicas and the stripe count as seen in
https://github.com/ceph/ceph-tools/blob/master/models/reliability/RadosRely.py#L86.

The thing that doesn’t have sense to me is the way the multiplier is
calculated for the failure of the remaining copies in
https://github.com/ceph/ceph-tools/blob/master/models/reliability/RadosRely.py#L92
Why the stripes are not taking into account? What is the purpose of
using the “declustering factor” on that equation? Is that equation
correct? I read this note by sage
https://www.mail-archive.com/ceph-devel@xxxxxxxxxxxxxxx/msg01650.html
trying to clarify the role of PGs but didn’t help me to understand it.

Besides, I have a simple question related with the equation on L86 for
the initial failure. The stripping process splits user content in
#number of objects, which equivalent to the stripe count. That group
of objects constitutes an object set. Each object is composed by one
or more stripes units. All stripes units (stripe count) are written in
parallel. Typically each object is mapped to a different disk.  What
happen when the object set is full and a new object is started? Are
this new objects assigned to same disks used for the previous full
object set?

Best

koleosfuscus

________________________________________________________________
"My reply is: the software has no known bugs, therefore it has not
been updated."
Wietse Venema
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html