Unfortunately I can't verify if ceph reports any inactive PG. As soon as the second host disconnects practically everything is locked, nothing appears even using "ceph -w". It only appears that the OSDs are offline when dcs2 returns. Note: Apparently there was a new update recently. When I was in the test environment, this behavior was not happening, dcs1 was UP with all services without crashing even with dcs2 DOWN, performing reading and writing, even without adding dcs3. ### COMMANDS ### [ceph: root@dcs1 /]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 65.49570 root default -3 32.74785 host dcs1 0 hdd 2.72899 osd.0 up 1.00000 1.00000 1 hdd 2.72899 osd.1 up 1.00000 1.00000 2 hdd 2.72899 osd.2 up 1.00000 1.00000 3 hdd 2.72899 osd.3 up 1.00000 1.00000 4 hdd 2.72899 osd.4 up 1.00000 1.00000 5 hdd 2.72899 osd.5 up 1.00000 1.00000 6 hdd 2.72899 osd.6 up 1.00000 1.00000 7 hdd 2.72899 osd.7 up 1.00000 1.00000 8 hdd 2.72899 osd.8 up 1.00000 1.00000 9 hdd 2.72899 osd.9 up 1.00000 1.00000 10 hdd 2.72899 osd.10 up 1.00000 1.00000 11 hdd 2.72899 osd.11 up 1.00000 1.00000 -5 32.74785 host dcs2 12 hdd 2.72899 osd.12 up 1.00000 1.00000 13 hdd 2.72899 osd.13 up 1.00000 1.00000 14 hdd 2.72899 osd.14 up 1.00000 1.00000 15 hdd 2.72899 osd.15 up 1.00000 1.00000 16 hdd 2.72899 osd.16 up 1.00000 1.00000 17 hdd 2.72899 osd.17 up 1.00000 1.00000 18 hdd 2.72899 osd.18 up 1.00000 1.00000 19 hdd 2.72899 osd.19 up 1.00000 1.00000 20 hdd 2.72899 osd.20 up 1.00000 1.00000 21 hdd 2.72899 osd.21 up 1.00000 1.00000 22 hdd 2.72899 osd.22 up 1.00000 1.00000 23 hdd 2.72899 osd.23 up 1.00000 1.00000 [ceph: root@dcs1 /]# ceph osd pool ls detail pool 1 '.mgr' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 26 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr pool 2 'cephfs.ovirt_hosted_engine.meta' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 77 lfor 0/0/47 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs pool 3 'cephfs.ovirt_hosted_engine.data' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 179 lfor 0/0/47 flags hashpspool max_bytes 107374182400 stripe_width 0 application cephfs pool 6 '.nfs' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 254 lfor 0/0/252 flags hashpspool stripe_width 0 application nfs pool 7 'cephfs.ovirt_storage_sas.meta' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 322 lfor 0/0/287 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs pool 8 'cephfs.ovirt_storage_sas.data' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 291 lfor 0/0/289 flags hashpspool stripe_width 0 application cephfs pool 9 'cephfs.ovirt_storage_iso.meta' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 356 lfor 0/0/325 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs pool 10 'cephfs.ovirt_storage_iso.data' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 329 lfor 0/0/327 flags hashpspool stripe_width 0 application cephfs [ceph: root@dcs1 /]# ceph osd crush rule dump replicated_rule { "rule_id": 0, "rule_name": "replicated_rule", "type": 1, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } [ceph: root@dcs1 /]# ceph pg ls-by-pool cephfs.ovirt_hosted_engine.data PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP LAST_SCRUB_DURATION SCRUB_SCHEDULING 3.0 69 0 0 0 285213095 0 0 10057 active+clean 41m 530'20632 530:39461 [1,23]p1 [1,23]p1 2022-10-13T03:19:33.649837+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T07:24:46.314217+0000 3.1 58 0 0 0 242319360 0 0 10026 active+clean 41m 530'11926 530:21424 [6,19]p6 [6,19]p6 2022-10-13T02:15:23.395162+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T11:42:17.682881+0000 3.2 71 0 0 0 294629376 0 0 10012 active+clean 41m 530'12312 530:25506 [10,16]p10 [10,16]p10 2022-10-13T06:12:48.839013+0000 2022-10-11T21:09:49.405860+0000 1 periodic scrub scheduled @ 2022-10-14T12:35:23.917129+0000 3.3 63 0 0 0 262520832 0 0 10073 active+clean 41m 530'20204 530:42834 [13,11]p13 [13,11]p13 2022-10-13T01:16:17.672947+0000 2022-10-11T16:43:27.935298+0000 1 periodic scrub scheduled @ 2022-10-14T11:48:42.643271+0000 3.4 59 0 0 0 240611328 0 0 10017 active+clean 41m 530'17883 530:32537 [10,22]p10 [10,22]p10 2022-10-12T22:09:09.376552+0000 2022-10-10T15:00:52.196397+0000 1 periodic scrub scheduled @ 2022-10-14T01:16:35.682204+0000 3.5 67 0 0 0 281018368 0 0 10017 active+clean 41m 530'18825 530:31531 [18,3]p18 [18,3]p18 2022-10-12T18:13:50.835870+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T02:17:12.292237+0000 3.6 60 0 0 0 239497216 0 0 10079 active+clean 41m 530'22537 530:34790 [0,21]p0 [0,21]p0 2022-10-12T20:38:44.998414+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T08:12:12.106892+0000 3.7 54 0 0 0 221261824 0 0 10082 active+clean 41m 530'30718 530:37349 [4,12]p4 [4,12]p4 2022-10-12T20:26:54.091307+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-13T20:51:54.792643+0000 3.8 70 0 0 0 293588992 0 0 4527 active+clean 41m 530'4527 530:16905 [11,21]p11 [11,21]p11 2022-10-13T07:16:50.226814+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T13:02:27.444761+0000 3.9 47 0 0 0 192938407 0 0 10065 active+clean 41m 530'11065 530:21345 [19,11]p19 [19,11]p19 2022-10-13T05:05:36.274216+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T08:17:25.165367+0000 3.a 60 0 0 0 251658240 0 0 10044 active+clean 41m 530'14744 530:23145 [18,1]p18 [18,1]p18 2022-10-13T04:29:38.891055+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T11:10:38.556482+0000 3.b 52 0 0 0 209567744 0 0 4949 active+clean 41m 530'4949 530:26757 [7,23]p7 [7,23]p7 2022-10-12T22:08:45.621201+0000 2022-10-10T15:00:36.799456+0000 1 periodic scrub scheduled @ 2022-10-14T02:28:08.061560+0000 3.c 68 0 0 0 276607307 0 0 10003 active+clean 41m 530'18828 530:39884 [18,8]p18 [18,8]p18 2022-10-12T18:25:36.991393+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T00:43:12.804024+0000 3.d 67 0 0 0 272621888 0 0 6708 active+clean 41m 530'8359 530:33988 [13,7]p13 [13,7]p13 2022-10-12T21:42:29.600145+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-13T23:30:29.341646+0000 3.e 68 0 0 0 276746240 0 0 5178 active+clean 41m 530'5278 530:16051 [13,1]p13 [13,1]p13 2022-10-13T05:47:06.004714+0000 2022-10-11T21:04:57.978685+0000 1 periodic scrub scheduled @ 2022-10-14T11:45:33.438178+0000 3.f 65 0 0 0 269307904 0 0 10056 active+clean 41m 530'34965 530:49963 [23,4]p23 [23,4]p23 2022-10-13T08:58:09.493284+0000 2022-10-10T15:00:36.390467+0000 1 periodic scrub scheduled @ 2022-10-14T12:18:58.610252+0000 3.10 66 0 0 0 271626240 0 0 4272 active+clean 41m 530'4431 530:19010 [12,9]p12 [12,9]p12 2022-10-13T03:52:14.952046+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T07:48:12.441144+0000 3.11 58 0 0 0 239075657 0 0 6466 active+clean 41m 530'8563 530:24677 [18,0]p18 [18,0]p18 2022-10-12T22:25:17.255090+0000 2022-10-10T15:00:43.412084+0000 1 periodic scrub scheduled @ 2022-10-14T03:25:34.048845+0000 3.12 45 0 0 0 186254336 0 0 10084 active+clean 41m 530'16084 530:31273 [6,14]p6 [6,14]p6 2022-10-13T03:05:14.109923+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T03:35:11.159743+0000 3.13 68 0 0 0 275124224 0 0 10013 active+clean 41m 530'28676 530:52278 [16,8]p16 [16,8]p16 2022-10-12T21:46:50.747741+0000 2022-10-11T16:48:56.632027+0000 1 periodic scrub scheduled @ 2022-10-14T07:03:49.125496+0000 3.14 58 0 0 0 240123904 0 0 7531 active+clean 41m 530'8212 530:26075 [23,4]p23 [23,4]p23 2022-10-13T04:25:39.131070+0000 2022-10-13T04:25:39.131070+0000 4 periodic scrub scheduled @ 2022-10-14T05:36:16.428326+0000 3.15 59 0 0 0 247382016 0 0 8890 active+clean 41m 530'8890 530:18892 [23,3]p23 [23,3]p23 2022-10-13T04:45:48.156899+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T14:55:14.651919+0000 3.16 57 0 0 0 237285376 0 0 6900 active+clean 41m 530'8766 530:20717 [19,9]p19 [19,9]p19 2022-10-13T00:13:35.716060+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T07:08:16.779024+0000 3.17 56 0 0 0 234303488 0 0 10012 active+clean 41m 530'21461 530:31490 [0,13]p0 [0,13]p0 2022-10-13T07:42:57.775955+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T14:52:30.758744+0000 3.18 47 0 0 0 197132288 0 0 10001 active+clean 41m 530'14783 530:20829 [10,14]p10 [10,14]p10 2022-10-13T00:41:44.050740+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T09:30:02.438044+0000 3.19 50 0 0 0 209715200 0 0 10058 active+clean 41m 499'19880 530:27891 [8,23]p8 [8,23]p8 2022-10-13T10:58:13.948274+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T19:55:12.268345+0000 3.1a 58 0 0 0 240123904 0 0 10037 active+clean 41m 530'36799 530:50997 [16,9]p16 [16,9]p16 2022-10-13T02:03:18.026427+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T04:55:58.684437+0000 3.1b 53 0 0 0 219996160 0 0 10051 active+clean 41m 530'18388 530:29223 [0,22]p0 [0,22]p0 2022-10-12T19:19:25.675030+0000 2022-10-12T19:19:25.675030+0000 4 periodic scrub scheduled @ 2022-10-14T00:21:49.935082+0000 3.1c 66 0 0 0 276762624 0 0 10027 active+clean 41m 530'16327 530:38127 [20,5]p20 [20,5]p20 2022-10-13T00:04:49.227288+0000 2022-10-10T15:00:38.834351+0000 1 periodic scrub scheduled @ 2022-10-14T01:15:26.524544+0000 3.1d 49 0 0 0 201327104 0 0 10020 active+clean 41m 530'26433 530:51593 [17,9]p17 [17,9]p17 2022-10-13T03:49:02.466987+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T09:04:39.909179+0000 3.1e 61 0 0 0 249098595 0 0 8790 active+clean 41m 530'8790 530:17807 [3,21]p3 [3,21]p3 2022-10-12T22:28:19.417597+0000 2022-10-10T15:00:39.474873+0000 1 periodic scrub scheduled @ 2022-10-13T23:49:55.974786+0000 3.1f 53 0 0 0 222056448 0 0 10053 active+clean 41m 530'35776 530:50234 [0,15]p0 [0,15]p0 2022-10-13T07:16:46.787818+0000 2022-10-10T14:57:31.136809+0000 1 periodic scrub scheduled @ 2022-10-14T16:24:45.860894+0000 * NOTE: Omap statistics are gathered during deep scrub and may be inaccurate soon afterwards depending on utilization. See http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further details. Em qui., 13 de out. de 2022 às 13:54, Eugen Block <eblock@xxxxxx> escreveu: > Could you share more details? Does ceph report inactive PGs when one > node is down? Please share: > ceph osd tree > ceph osd pool ls detail > ceph osd crush rule dump <rule of affected pool> > ceph pg ls-by-pool <affected pool> > ceph -s > > Zitat von Murilo Morais <murilo@xxxxxxxxxxxxxx>: > > > Thanks for answering. > > Marc, but there is no mechanism to prevent IO pause? At the moment I > don't > > worry about data loss. > > I understand that putting it as replica x1 can work, but I need it to be > x2. > > > > Em qui., 13 de out. de 2022 às 12:26, Marc <Marc@xxxxxxxxxxxxxxxxx> > > escreveu: > > > >> > >> > > >> > I'm having strange behavior on a new cluster. > >> > >> Not strange, by design > >> > >> > I have 3 machines, two of them have the disks. We can name them like > >> > this: > >> > dcs1 to dcs3. The dcs1 and dcs2 machines contain the disks. > >> > > >> > I started bootstrapping through dcs1, added the other hosts and left > mgr > >> > on > >> > dcs3 only. > >> > > >> > What is happening is that if I take down dcs2 everything hangs and > >> > becomes > >> > irresponsible, including the mount points that were pointed to dcs1. > >> > >> You have to have disks in 3 machines. (Or set the replication to 1x) > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx