Hi Dale Can you please post the ceph status ? I’m no expert but I would make sure that the datacenter you intend to operate (while the connection gets reestablished) has two active monitors. Thanks. Yanko. > On Nov 29, 2022, at 7:20 AM, Wolfpaw - Dale Corse <dale@xxxxxxxxxxx> wrote: > > Hi All, > > > > We had a fiber cut tonight between 2 data centers, and a ceph cluster didn't > do very well :( We ended up with 98% of PGs as down. > > > > This setup has 2 data centers defined, with 4 copies across both, and a > minimum of size of 1. We have 1 mon/mgr in each DC, with one in a 3rd data > center connected to each of the other 2 by VPN. > > > > When I did a pg query on the PG's that were stuck it said they were blocked > from coming up because they couldn't contact 2 of the OSDs (located in the > other data center that it was unable to reach).. but the other 2 were fine. > > > > I'm at a loss because this was exactly the thing we thought we had set it up > to prevent.. and with size = 4 and min_size set = 1 I understood that it > would continue without a problem? :( > > > > Crush map is below .. if anyone has any ideas? I would sincerely appreciate > it :) > > > > Thanks! > > Dale > > > > # begin crush map > > tunable choose_local_tries 0 > > tunable choose_local_fallback_tries 0 > > tunable choose_total_tries 50 > > tunable chooseleaf_descend_once 1 > > tunable chooseleaf_vary_r 1 > > tunable straw_calc_version 1 > > > > # devices > > device 0 osd.0 class ssd > > device 1 osd.1 class ssd > > device 2 osd.2 class ssd > > device 3 osd.3 class ssd > > device 4 osd.4 class ssd > > device 5 osd.5 class ssd > > device 6 osd.6 class ssd > > device 7 osd.7 class ssd > > device 8 osd.8 class ssd > > device 9 osd.9 class ssd > > device 10 osd.10 class ssd > > device 11 osd.11 class ssd > > device 12 osd.12 class ssd > > device 13 osd.13 class ssd > > device 14 osd.14 class ssd > > device 15 osd.15 class ssd > > device 16 osd.16 class ssd > > device 17 osd.17 class ssd > > device 18 osd.18 class ssd > > device 19 osd.19 class ssd > > device 20 osd.20 class ssd > > device 21 osd.21 class ssd > > device 22 osd.22 class ssd > > device 23 osd.23 class ssd > > device 24 osd.24 class ssd > > device 25 osd.25 class ssd > > device 26 osd.26 class ssd > > device 27 osd.27 class ssd > > device 28 osd.28 class ssd > > device 29 osd.29 class ssd > > device 30 osd.30 class ssd > > device 31 osd.31 class ssd > > device 32 osd.32 class ssd > > device 33 osd.33 class ssd > > device 34 osd.34 class ssd > > device 35 osd.35 class ssd > > device 36 osd.36 class ssd > > device 37 osd.37 class ssd > > device 38 osd.38 class ssd > > device 39 osd.39 class ssd > > device 40 osd.40 class ssd > > device 41 osd.41 class ssd > > device 42 osd.42 class ssd > > device 43 osd.43 class ssd > > device 44 osd.44 class ssd > > device 45 osd.45 class ssd > > device 46 osd.46 class ssd > > device 47 osd.47 class ssd > > device 49 osd.49 class ssd > > > > # types > > type 0 osd > > type 1 host > > type 2 chassis > > type 3 rack > > type 4 row > > type 5 pdu > > type 6 pod > > type 7 room > > type 8 datacenter > > type 9 region > > type 10 root > > > > # buckets > > host Pnode01 { > > id -8 # do not change unnecessarily > > id -9 class ssd # do not change unnecessarily > > # weight 0.000 > > alg straw2 > > hash 0 # rjenkins1 > > } > > host node01 { > > id -2 # do not change unnecessarily > > id -15 class ssd # do not change unnecessarily > > # weight 14.537 > > alg straw2 > > hash 0 # rjenkins1 > > item osd.4 weight 1.817 > > item osd.1 weight 1.817 > > item osd.3 weight 1.817 > > item osd.2 weight 1.817 > > item osd.6 weight 1.817 > > item osd.9 weight 1.817 > > item osd.5 weight 1.817 > > item osd.0 weight 1.818 > > } > > host node02 { > > id -3 # do not change unnecessarily > > id -16 class ssd # do not change unnecessarily > > # weight 14.536 > > alg straw2 > > hash 0 # rjenkins1 > > item osd.10 weight 1.817 > > item osd.11 weight 1.817 > > item osd.12 weight 1.817 > > item osd.13 weight 1.817 > > item osd.14 weight 1.817 > > item osd.15 weight 1.817 > > item osd.16 weight 1.817 > > item osd.19 weight 1.817 > > } > > host node03 { > > id -4 # do not change unnecessarily > > id -17 class ssd # do not change unnecessarily > > # weight 14.536 > > alg straw2 > > hash 0 # rjenkins1 > > item osd.20 weight 1.817 > > item osd.21 weight 1.817 > > item osd.22 weight 1.817 > > item osd.23 weight 1.817 > > item osd.25 weight 1.817 > > item osd.26 weight 1.817 > > item osd.29 weight 1.817 > > item osd.24 weight 1.817 > > } > > datacenter EDM1 { > > id -11 # do not change unnecessarily > > id -14 class ssd # do not change unnecessarily > > # weight 43.609 > > alg straw > > hash 0 # rjenkins1 > > item node01 weight 14.537 > > item node02 weight 14.536 > > item node03 weight 14.536 > > } > > host node04 { > > id -5 # do not change unnecessarily > > id -18 class ssd # do not change unnecessarily > > # weight 14.536 > > alg straw2 > > hash 0 # rjenkins1 > > item osd.30 weight 1.817 > > item osd.31 weight 1.817 > > item osd.32 weight 1.817 > > item osd.33 weight 1.817 > > item osd.34 weight 1.817 > > item osd.35 weight 1.817 > > item osd.36 weight 1.817 > > item osd.39 weight 1.817 > > } > > host node05 { > > id -6 # do not change unnecessarily > > id -19 class ssd # do not change unnecessarily > > # weight 14.536 > > alg straw2 > > hash 0 # rjenkins1 > > item osd.40 weight 1.817 > > item osd.41 weight 1.817 > > item osd.42 weight 1.817 > > item osd.43 weight 1.817 > > item osd.44 weight 1.817 > > item osd.45 weight 1.817 > > item osd.46 weight 1.817 > > item osd.49 weight 1.817 > > } > > host node06 { > > id -7 # do not change unnecessarily > > id -20 class ssd # do not change unnecessarily > > # weight 16.353 > > alg straw2 > > hash 0 # rjenkins1 > > item osd.47 weight 1.817 > > item osd.37 weight 1.817 > > item osd.27 weight 1.817 > > item osd.38 weight 1.817 > > item osd.7 weight 1.817 > > item osd.28 weight 1.817 > > item osd.8 weight 1.817 > > item osd.17 weight 1.817 > > item osd.18 weight 1.817 > > } > > datacenter EDM3 { > > id -12 # do not change unnecessarily > > id -13 class ssd # do not change unnecessarily > > # weight 45.425 > > alg straw > > hash 0 # rjenkins1 > > item node04 weight 14.536 > > item node05 weight 14.536 > > item node06 weight 16.353 > > } > > datacenter EDM2 { > > id -10 # do not change unnecessarily > > id -22 class ssd # do not change unnecessarily > > # weight 0.000 > > alg straw > > hash 0 # rjenkins1 > > } > > root default { > > id -1 # do not change unnecessarily > > id -21 class ssd # do not change unnecessarily > > # weight 89.034 > > alg straw2 > > hash 0 # rjenkins1 > > item Pnode01 weight 0.000 > > item EDM1 weight 43.609 > > item EDM3 weight 45.425 > > item EDM2 weight 0.000 > > } > > > > # rules > > rule replicated_ruleset { > > id 0 > > type replicated > > min_size 1 > > max_size 10 > > step take default > > step choose firstn 2 type datacenter > > step chooseleaf firstn 2 type host > > step emit > > } > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx