Hello Josh, I simulated the osd.14 failure by the following steps: 1. hot unplug the disk 2. systemctl stop ceph-osd@14 3. ceph osd out 14 The used CRUSH rule to create the EC8+3 pool is described as below: # ceph osd crush rule dump erasure_hdd_mhosts { "rule_id": 8, "rule_name": "erasure_hdd_mhosts", "ruleset": 8, "type": 3, "min_size": 1, "max_size": 16, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_indep", "num": 0, "type": "host" }, { "op": "emit" } ] } And the output of `ceph osd tree` is also attached: [~] # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 32.36148 root default -13 2.69679 host jceph-n01 0 hdd 0.89893 osd.0 up 1.00000 1.00000 1 hdd 0.89893 osd.1 up 1.00000 1.00000 2 hdd 0.89893 osd.2 up 1.00000 1.00000 -17 2.69679 host jceph-n02 3 hdd 0.89893 osd.3 up 1.00000 1.00000 4 hdd 0.89893 osd.4 up 1.00000 1.00000 5 hdd 0.89893 osd.5 up 1.00000 1.00000 -21 2.69679 host jceph-n03 6 hdd 0.89893 osd.6 up 1.00000 1.00000 7 hdd 0.89893 osd.7 up 1.00000 1.00000 8 hdd 0.89893 osd.8 up 1.00000 1.00000 -25 2.69679 host jceph-n04 9 hdd 0.89893 osd.9 up 1.00000 1.00000 10 hdd 0.89893 osd.10 up 1.00000 1.00000 11 hdd 0.89893 osd.11 up 1.00000 1.00000 -29 2.69679 host jceph-n05 12 hdd 0.89893 osd.12 up 1.00000 1.00000 13 hdd 0.89893 osd.13 up 1.00000 1.00000 14 hdd 0.89893 osd.14 up 1.00000 1.00000 -33 2.69679 host jceph-n06 15 hdd 0.89893 osd.15 up 1.00000 1.00000 16 hdd 0.89893 osd.16 up 1.00000 1.00000 17 hdd 0.89893 osd.17 up 1.00000 1.00000 -37 2.69679 host jceph-n07 18 hdd 0.89893 osd.18 up 1.00000 1.00000 19 hdd 0.89893 osd.19 up 1.00000 1.00000 20 hdd 0.89893 osd.20 up 1.00000 1.00000 -41 2.69679 host jceph-n08 21 hdd 0.89893 osd.21 up 1.00000 1.00000 22 hdd 0.89893 osd.22 up 1.00000 1.00000 23 hdd 0.89893 osd.23 up 1.00000 1.00000 -45 2.69679 host jceph-n09 24 hdd 0.89893 osd.24 up 1.00000 1.00000 25 hdd 0.89893 osd.25 up 1.00000 1.00000 26 hdd 0.89893 osd.26 up 1.00000 1.00000 -49 2.69679 host jceph-n10 27 hdd 0.89893 osd.27 up 1.00000 1.00000 28 hdd 0.89893 osd.28 up 1.00000 1.00000 29 hdd 0.89893 osd.29 up 1.00000 1.00000 -53 2.69679 host jceph-n11 30 hdd 0.89893 osd.30 up 1.00000 1.00000 31 hdd 0.89893 osd.31 up 1.00000 1.00000 32 hdd 0.89893 osd.32 up 1.00000 1.00000 -57 2.69679 host jceph-n12 33 hdd 0.89893 osd.33 up 1.00000 1.00000 34 hdd 0.89893 osd.34 up 1.00000 1.00000 35 hdd 0.89893 osd.35 up 1.00000 1.00000 Thanks for your help. - Jerry On Fri, 23 Jul 2021 at 22:40, Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> wrote: > > Hi Jerry, > > In general, your CRUSH rules should define the behaviour you're > looking for. Based on what you've stated about your configuration, > after failing a single node or an OSD on a single node, then you > should still be able to tolerate two more failures in the system > without losing data (or losing access to data, given that min_size=k, > though I believe it's recommended to set min_size=k+1). > > However, that sequence of acting sets doesn't make a whole lot of > sense to me for a single OSD failure (though perhaps I'm misreading > them). Can you clarify exactly how you simulated the osd.14 failure? > It might also be helpful to post your CRUSH rule and "ceph osd tree". > > Josh > > On Fri, Jul 23, 2021 at 1:42 AM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote: > > > > Hello, > > > > I would like to know the maximum number of node failures for a EC8+3 > > pool in a 12-node cluster with 3 OSDs in each node. The size and > > min_size of the EC8+3 pool is configured as 11 and 8, and OSDs of each > > PG are selected by host. When there is no node failure, the maximum > > number of node failures is 3, right? After unplugging a OSD (osd.14) > > in the cluster, I check the PG acting set changes and one of the > > results is shown as below: > > > > T0: > > [15,31,11,34,28,1,8,26,14,19,5] > > > > T1: after unplugging a OSD (osd.14) and recovery started > > [15,31,11,34,28,1,8,26,NONE,19,5] > > > > T2: > > [15,31,11,34,21,1,8,26,19,29,5] > > > > T3: > > [15,31,11,34,NONE,1,8,26,NONE,NONE,5] > > > > T4: recovery was done > > [15,31,11,34,21,1,8,26,19,29,5] > > > > For the PG, 3 OSD peers changed during the recovery progress > > ([_,_,_,_,28->21,_,_,_,14->19,19->29,_]). It seems that min_size (8) > > of chunks of the EC8+3 pool are kept during recovery. Does it mean > > that no more node failures are bearable during T3 to T4? Can we > > calculate the maximum number of node failures by examining all the > > acting sets of the PGs? Is there some simple way to obtain such > > information? Any ideas and feedback are appreciated, thanks! > > > > - Jerry > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx