Hello all, Thanks for the comments. It is true, the state model has too many details. I don’t see such level of detailed in the current code, something that I fully agree. As the original author said, we need to “assert only failure cases with statistically interesting consequences”. But, I did a diagram for two reasons: 1) get an overall idea of how and when the process related with detecting and repairing errors are launched and 2) see if other reliability models may apply to the ceph case. Overall, I think the figure is much better than the diagram published in http://ceph.com/docs/firefly/dev/peering/ that triggered me to do sth about it. Initially, I was trying to understand the peering process and all the things that happen when a OSD fails the peering. In future, I would like to have a better version and maybe include it in the paper. Best koleosfuscus ________________________________________________________________ "My reply is: the software has no known bugs, therefore it has not been updated." Wietse Venema On Wed, Jul 2, 2014 at 9:28 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: > Hi Koleofuscus, Loic, > > I see several minor inaccuracies on that diagram, but my main question > (like Loic) is whether they are relevant for the reliability model. > > For example, backfill and 'remap and get sync' are more or less equivalent > (tho backfill is missing the 'delete stray replicas' state after); it make > little difference from a durability perspective whether it is the primary > or a replica that fails unless you are watching the messages go by. And I > think the exceptional cases (unfound objects) can be simplified. > > sage > > > On Tue, 1 Jul 2014, Loic Dachary wrote: > >> >> Hi koleofuscus, >> >> I took a look at >> https://wiki.ceph.com/Development/Add_erasure_coding_to_the_durability_model/Technical_details_on_the_model >> and it makes sense to me. However I wonder why you would want to go into >> that level of detail ? Could you point me where you would encode such >> details events in the current code ? >> >> Cheers >> >> On 26/06/2014 02:29, Koleos Fuscus wrote: >> > Hi, >> > >> > I found a state model diagram in the page that explains peering >> > (http://ceph.com/docs/firefly/dev/peering/). The image cannot be >> > visualised correctly. I build the docs and could open the image in an >> > appropriate size using graphviz but still the layout is too complex >> > and messy. >> > I decided to do my own state model to collaborate in the understanding >> > of ceph. Actually, it helps me to measure my own understanding of the >> > system. Using the diagram facilitates the decision of modeling one >> > event or skip another. You can find the diagram here: >> > https://wiki.ceph.com/Development/Add_erasure_coding_to_the_durability_model/Technical_details_on_the_model >> > >> > Please give me your comments. >> > >> > koleosfuscus >> > >> > ________________________________________________________________ >> > "My reply is: the software has no known bugs, therefore it has not >> > been updated." >> > Wietse Venema >> > >> >> -- >> Lo?c Dachary, Artisan Logiciel Libre >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html