I managed to solve this problem. To document the resolution: The firewall was blocking communication. After disabling everything related to it and restarting the machine everything went back to normal. Em ter., 1 de nov. de 2022 às 10:46, Murilo Morais <murilo@xxxxxxxxxxxxxx> escreveu: > Good morning everyone! > > Today there was an atypical situation in our Cluster where the three > machines came to shut down. > > On powering up the cluster went up and formed quorum with no problems, but > the PGs are all in Working, I don't see any disk activity on the machines. > No PG is active. > > > > > [ceph: root@dcs1 /]# ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 98.24359 root default > -3 32.74786 host dcs1 > 0 hdd 2.72899 osd.0 up 1.00000 1.00000 > 1 hdd 2.72899 osd.1 up 1.00000 1.00000 > 2 hdd 2.72899 osd.2 up 1.00000 1.00000 > 3 hdd 2.72899 osd.3 up 1.00000 1.00000 > 4 hdd 2.72899 osd.4 up 1.00000 1.00000 > 5 hdd 2.72899 osd.5 up 1.00000 1.00000 > 6 hdd 2.72899 osd.6 up 1.00000 1.00000 > 7 hdd 2.72899 osd.7 up 1.00000 1.00000 > 8 hdd 2.72899 osd.8 up 1.00000 1.00000 > 9 hdd 2.72899 osd.9 up 1.00000 1.00000 > 10 hdd 2.72899 osd.10 up 1.00000 1.00000 > 11 hdd 2.72899 osd.11 up 1.00000 1.00000 > -5 32.74786 host dcs2 > 12 hdd 2.72899 osd.12 up 1.00000 1.00000 > 13 hdd 2.72899 osd.13 up 1.00000 1.00000 > 14 hdd 2.72899 osd.14 up 1.00000 1.00000 > 15 hdd 2.72899 osd.15 up 1.00000 1.00000 > 16 hdd 2.72899 osd.16 up 1.00000 1.00000 > 17 hdd 2.72899 osd.17 up 1.00000 1.00000 > 18 hdd 2.72899 osd.18 up 1.00000 1.00000 > 19 hdd 2.72899 osd.19 up 1.00000 1.00000 > 20 hdd 2.72899 osd.20 up 1.00000 1.00000 > 21 hdd 2.72899 osd.21 up 1.00000 1.00000 > 22 hdd 2.72899 osd.22 up 1.00000 1.00000 > 23 hdd 2.72899 osd.23 up 1.00000 1.00000 > -7 32.74786 host dcs3 > 24 hdd 2.72899 osd.24 up 1.00000 1.00000 > 25 hdd 2.72899 osd.25 up 1.00000 1.00000 > 26 hdd 2.72899 osd.26 up 1.00000 1.00000 > 27 hdd 2.72899 osd.27 up 1.00000 1.00000 > 28 hdd 2.72899 osd.28 up 1.00000 1.00000 > 29 hdd 2.72899 osd.29 up 1.00000 1.00000 > 30 hdd 2.72899 osd.30 up 1.00000 1.00000 > 31 hdd 2.72899 osd.31 up 1.00000 1.00000 > 32 hdd 2.72899 osd.32 up 1.00000 1.00000 > 33 hdd 2.72899 osd.33 up 1.00000 1.00000 > 34 hdd 2.72899 osd.34 up 1.00000 1.00000 > 35 hdd 2.72899 osd.35 up 1.00000 1.00000 > > > > > [ceph: root@dcs1 /]# ceph -s > cluster: > id: 58bbb950-538b-11ed-b237-2c59e53b80cc > health: HEALTH_WARN > 4 filesystems are degraded > 4 MDSs report slow metadata IOs > Reduced data availability: 1153 pgs inactive, 1101 pgs peering > 26 slow ops, oldest one blocked for 563 sec, daemons > [osd.10,osd.13,osd.14,osd.15,osd.16,osd.18,osd.20,osd.21,osd.24,osd.25]... > have slow ops. > > services: > mon: 3 daemons, quorum dcs1.evocorp,dcs2,dcs3 (age 7m) > mgr: dcs1.evocorp.kyqfcd(active, since 15m), standbys: dcs2.rirtyl > mds: 4/4 daemons up, 4 standby > osd: 36 osds: 36 up (since 6m), 36 in (since 47m); 65 remapped pgs > > data: > volumes: 0/4 healthy, 4 recovering > pools: 10 pools, 1153 pgs > objects: 254.72k objects, 994 GiB > usage: 2.8 TiB used, 95 TiB / 98 TiB avail > pgs: 100.000% pgs not active > 1036 peering > 65 remapped+peering > 52 activating > > > > > [ceph: root@dcs1 /]# ceph health detail > HEALTH_WARN 4 filesystems are degraded; 4 MDSs report slow metadata IOs; > Reduced data availability: 1153 pgs inactive, 1101 pgs peering; 26 slow > ops, oldest one blocked for 673 sec, daemons > [osd.10,osd.13,osd.14,osd.15,osd.16,osd.18,osd.20,osd.21,osd.24,osd.25]... > have slow ops. > [WRN] FS_DEGRADED: 4 filesystems are degraded > fs dc_ovirt is degraded > fs dc_iso is degraded > fs dc_sas is degraded > fs pool_tester is degraded > [WRN] MDS_SLOW_METADATA_IO: 4 MDSs report slow metadata IOs > mds.dc_sas.dcs1.wbyuik(mds.0): 4 slow metadata IOs are blocked > 30 > secs, oldest blocked for 1063 secs > mds.dc_ovirt.dcs1.lpcazs(mds.0): 4 slow metadata IOs are blocked > 30 > secs, oldest blocked for 1058 secs > mds.pool_tester.dcs1.ixkkfs(mds.0): 4 slow metadata IOs are blocked > > 30 secs, oldest blocked for 1058 secs > mds.dc_iso.dcs1.jxqqjd(mds.0): 4 slow metadata IOs are blocked > 30 > secs, oldest blocked for 1058 secs > [WRN] PG_AVAILABILITY: Reduced data availability: 1153 pgs inactive, 1101 > pgs peering > pg 6.c3 is stuck inactive for 50m, current state peering, last acting > [30,15,11] > pg 6.c4 is stuck peering for 10h, current state peering, last acting > [12,0,26] > pg 6.c5 is stuck peering for 10h, current state peering, last acting > [12,32,6] > pg 6.c6 is stuck peering for 11h, current state peering, last acting > [30,4,22] > pg 6.c7 is stuck peering for 10h, current state peering, last acting > [4,14,26] > pg 6.c8 is stuck peering for 10h, current state peering, last acting > [0,22,32] > pg 6.c9 is stuck peering for 11h, current state peering, last acting > [32,20,0] > pg 6.ca is stuck peering for 11h, current state peering, last acting > [31,0,23] > pg 6.cb is stuck peering for 10h, current state peering, last acting > [8,35,16] > pg 6.cc is stuck peering for 10h, current state peering, last acting > [8,24,13] > pg 6.cd is stuck peering for 10h, current state peering, last acting > [15,25,1] > pg 6.ce is stuck peering for 11h, current state peering, last acting > [27,23,4] > pg 6.cf is stuck peering for 11h, current state peering, last acting > [25,4,20] > pg 7.c4 is stuck peering for 11m, current state remapped+peering, last > acting [19,8] > pg 7.c5 is stuck peering for 10h, current state peering, last acting > [6,14,32] > pg 7.c6 is stuck peering for 10h, current state peering, last acting > [14,35,5] > pg 7.c7 is stuck peering for 10h, current state remapped+peering, last > acting [11,14] > pg 7.c8 is stuck peering for 10h, current state peering, last acting > [21,9,28] > pg 7.c9 is stuck peering for 10h, current state peering, last acting > [0,30,15] > pg 7.ca is stuck peering for 10h, current state peering, last acting > [23,2,26] > pg 7.cb is stuck peering for 10h, current state peering, last acting > [23,9,24] > pg 7.cc is stuck peering for 10h, current state peering, last acting > [23,27,0] > pg 7.cd is stuck peering for 11m, current state remapped+peering, > last acting [13,6] > pg 7.ce is stuck peering for 10h, current state peering, last acting > [16,1,25] > pg 7.cf is stuck peering for 11h, current state peering, last acting > [24,16,8] > pg 9.c0 is stuck peering for 10h, current state peering, last acting > [21,28] > pg 9.c1 is stuck peering for 10h, current state peering, last acting > [12,31] > pg 9.c2 is stuck peering for 10h, current state peering, last acting > [6,27] > pg 9.c3 is stuck peering for 10h, current state peering, last acting > [9,27] > pg 9.c4 is stuck peering for 50m, current state peering, last acting > [17,34] > pg 9.c5 is stuck peering for 11h, current state peering, last acting > [31,8] > pg 9.c6 is stuck peering for 10h, current state peering, last acting > [1,29] > pg 9.c7 is stuck peering for 10h, current state peering, last acting > [12,30] > pg 9.c8 is stuck peering for 11h, current state peering, last acting > [26,3] > pg 9.c9 is stuck peering for 11h, current state peering, last acting > [29,13] > pg 9.ca is stuck peering for 11h, current state peering, last acting > [25,6] > pg 9.cb is stuck peering for 10h, current state peering, last acting > [16,9] > pg 9.cc is stuck peering for 4h, current state peering, last acting > [4,29] > pg 10.c0 is stuck peering for 11h, current state peering, last acting > [32,19] > pg 10.c1 is stuck peering for 10h, current state peering, last acting > [23,6] > pg 10.c2 is stuck peering for 11h, current state peering, last acting > [24,7] > pg 10.c3 is stuck peering for 38m, current state peering, last acting > [5,20] > pg 10.c4 is stuck peering for 10h, current state peering, last acting > [21,4] > pg 10.c5 is stuck peering for 10h, current state peering, last acting > [12,8] > pg 10.c6 is stuck peering for 11h, current state peering, last acting > [34,7] > pg 10.c7 is stuck peering for 10h, current state peering, last acting > [17,30] > pg 10.c8 is stuck peering for 11h, current state peering, last acting > [24,19] > pg 10.c9 is stuck inactive for 54m, current state activating, last > acting [13,3] > pg 10.ca is stuck peering for 10h, current state peering, last acting > [16,6] > pg 10.cb is stuck peering for 11h, current state peering, last acting > [26,13] > pg 10.cf is stuck peering for 50m, current state peering, last acting > [21,24] > [WRN] SLOW_OPS: 26 slow ops, oldest one blocked for 673 sec, daemons > [osd.10,osd.13,osd.14,osd.15,osd.16,osd.18,osd.20,osd.21,osd.24,osd.25]... > have slow ops. > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx