I use Ubiquiti equipment, mainly because I'm not a network admin... I rebooted the 10G switches and now everything is working and recovering. I hate when there's not a definitive answer but that's kind of the deal when you use Ubiquiti stuff. Thank you Sean and Frank. Frank, you were right. It made no sense because from a very basic point of view the network seemed fine, but Sean's ping revealed that it clearly wasn't. Thank you! -jeremy On Mon, Jul 25, 2022 at 3:08 PM Sean Redmond <sean.redmond1@xxxxxxxxx> wrote: > Yea, assuming you can ping with a lower MTU, check the MTU on your > switching. > > On Mon, 25 Jul 2022, 23:05 Jeremy Hansen, <farnsworth.mcfadden@xxxxxxxxx> > wrote: > >> That results in packet loss: >> >> [root@cn01 ~]# ping -M do -s 8972 192.168.30.14 >> PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data. >> ^C >> --- 192.168.30.14 ping statistics --- >> 3 packets transmitted, 0 received, 100% packet loss, time 2062ms >> >> That's very weird... but this gives me something to figure out. Hmmm. >> Thank you. >> >> On Mon, Jul 25, 2022 at 3:01 PM Sean Redmond <sean.redmond1@xxxxxxxxx> >> wrote: >> >>> Looks good, just confirm it with a large ping with don't fragment flag >>> set between each host. >>> >>> ping -M do -s 8972 [destination IP] >>> >>> >>> On Mon, 25 Jul 2022, 22:56 Jeremy Hansen, <farnsworth.mcfadden@xxxxxxxxx> >>> wrote: >>> >>>> MTU is the same across all hosts: >>>> >>>> --------- cn01.ceph.la1.clx.corp--------- >>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000 >>>> inet 192.168.30.11 netmask 255.255.255.0 broadcast >>>> 192.168.30.255 >>>> inet6 fe80::3e8c:f8ff:feed:728d prefixlen 64 scopeid >>>> 0x20<link> >>>> ether 3c:8c:f8:ed:72:8d txqueuelen 1000 (Ethernet) >>>> RX packets 3163785 bytes 2136258888 (1.9 GiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 6890933 bytes 40233267272 (37.4 GiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> --------- cn02.ceph.la1.clx.corp--------- >>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000 >>>> inet 192.168.30.12 netmask 255.255.255.0 broadcast >>>> 192.168.30.255 >>>> inet6 fe80::3e8c:f8ff:feed:ff0c prefixlen 64 scopeid >>>> 0x20<link> >>>> ether 3c:8c:f8:ed:ff:0c txqueuelen 1000 (Ethernet) >>>> RX packets 3976256 bytes 2761764486 (2.5 GiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 9270324 bytes 56984933585 (53.0 GiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> --------- cn03.ceph.la1.clx.corp--------- >>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000 >>>> inet 192.168.30.13 netmask 255.255.255.0 broadcast >>>> 192.168.30.255 >>>> inet6 fe80::3e8c:f8ff:feed:feba prefixlen 64 scopeid >>>> 0x20<link> >>>> ether 3c:8c:f8:ed:fe:ba txqueuelen 1000 (Ethernet) >>>> RX packets 13081847 bytes 93614795356 (87.1 GiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 4001854 bytes 2536322435 (2.3 GiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> --------- cn04.ceph.la1.clx.corp--------- >>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000 >>>> inet 192.168.30.14 netmask 255.255.255.0 broadcast >>>> 192.168.30.255 >>>> inet6 fe80::3e8c:f8ff:feed:6f89 prefixlen 64 scopeid >>>> 0x20<link> >>>> ether 3c:8c:f8:ed:6f:89 txqueuelen 1000 (Ethernet) >>>> RX packets 60018 bytes 5622542 (5.3 MiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 59889 bytes 17463794 (16.6 MiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> --------- cn05.ceph.la1.clx.corp--------- >>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000 >>>> inet 192.168.30.15 netmask 255.255.255.0 broadcast >>>> 192.168.30.255 >>>> inet6 fe80::3e8c:f8ff:feed:7245 prefixlen 64 scopeid >>>> 0x20<link> >>>> ether 3c:8c:f8:ed:72:45 txqueuelen 1000 (Ethernet) >>>> RX packets 69163 bytes 8085511 (7.7 MiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 73539 bytes 17069869 (16.2 MiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> --------- cn06.ceph.la1.clx.corp--------- >>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000 >>>> inet 192.168.30.16 netmask 255.255.255.0 broadcast >>>> 192.168.30.255 >>>> inet6 fe80::3e8c:f8ff:feed:feab prefixlen 64 scopeid >>>> 0x20<link> >>>> ether 3c:8c:f8:ed:fe:ab txqueuelen 1000 (Ethernet) >>>> RX packets 23570 bytes 2251531 (2.1 MiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 22268 bytes 16186794 (15.4 MiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> 10G. >>>> >>>> On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond <sean.redmond1@xxxxxxxxx> >>>> wrote: >>>> >>>>> Is the MTU in n the new rack set correctly? >>>>> >>>>> On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, < >>>>> farnsworth.mcfadden@xxxxxxxxx> wrote: >>>>> >>>>>> I transitioned some servers to a new rack and now I'm having major >>>>>> issues >>>>>> with Ceph upon bringing things back up. >>>>>> >>>>>> I believe the issue may be related to the ceph nodes coming back up >>>>>> with >>>>>> different IPs before VLANs were set. That's just a guess because I >>>>>> can't >>>>>> think of any other reason this would happen. >>>>>> >>>>>> Current state: >>>>>> >>>>>> Every 2.0s: ceph -s >>>>>> cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022 >>>>>> >>>>>> cluster: >>>>>> id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d >>>>>> health: HEALTH_WARN >>>>>> 1 filesystem is degraded >>>>>> 2 MDSs report slow metadata IOs >>>>>> 2/5 mons down, quorum cn02,cn03,cn01 >>>>>> 9 osds down >>>>>> 3 hosts (17 osds) down >>>>>> Reduced data availability: 97 pgs inactive, 9 pgs down >>>>>> Degraded data redundancy: 13860144/30824413 objects >>>>>> degraded >>>>>> (44.965%), 411 pgs degraded, 482 pgs undersized >>>>>> >>>>>> services: >>>>>> mon: 5 daemons, quorum cn02,cn03,cn01 (age 62m), out of quorum: >>>>>> cn05, >>>>>> cn04 >>>>>> mgr: cn02.arszct(active, since 5m) >>>>>> mds: 2/2 daemons up, 2 standby >>>>>> osd: 35 osds: 15 up (since 62m), 24 in (since 58m); 222 remapped >>>>>> pgs >>>>>> >>>>>> data: >>>>>> volumes: 1/2 healthy, 1 recovering >>>>>> pools: 8 pools, 545 pgs >>>>>> objects: 7.71M objects, 6.7 TiB >>>>>> usage: 15 TiB used, 39 TiB / 54 TiB avail >>>>>> pgs: 0.367% pgs unknown >>>>>> 17.431% pgs not active >>>>>> 13860144/30824413 objects degraded (44.965%) >>>>>> 1137693/30824413 objects misplaced (3.691%) >>>>>> 280 active+undersized+degraded >>>>>> 67 undersized+degraded+remapped+backfilling+peered >>>>>> 57 active+undersized+remapped >>>>>> 45 active+clean+remapped >>>>>> 44 active+undersized+degraded+remapped+backfilling >>>>>> 18 undersized+degraded+peered >>>>>> 10 active+undersized >>>>>> 9 down >>>>>> 7 active+clean >>>>>> 3 active+undersized+remapped+backfilling >>>>>> 2 active+undersized+degraded+remapped+backfill_wait >>>>>> 2 unknown >>>>>> 1 undersized+peered >>>>>> >>>>>> io: >>>>>> client: 170 B/s rd, 0 op/s rd, 0 op/s wr >>>>>> recovery: 168 MiB/s, 158 keys/s, 166 objects/s >>>>>> >>>>>> I have to disable and re-enable the dashboard just to use it. It >>>>>> seems to >>>>>> get bogged down after a few moments. >>>>>> >>>>>> The three servers that were moved to the new rack Ceph has marked as >>>>>> "Down", but if I do a cephadm host-check, they all seem to pass: >>>>>> >>>>>> ************************ ceph ************************ >>>>>> --------- cn01.ceph.--------- >>>>>> podman (/usr/bin/podman) version 4.0.2 is present >>>>>> systemctl is present >>>>>> lvcreate is present >>>>>> Unit chronyd.service is enabled and running >>>>>> Host looks OK >>>>>> --------- cn02.ceph.--------- >>>>>> podman (/usr/bin/podman) version 4.0.2 is present >>>>>> systemctl is present >>>>>> lvcreate is present >>>>>> Unit chronyd.service is enabled and running >>>>>> Host looks OK >>>>>> --------- cn03.ceph.--------- >>>>>> podman (/usr/bin/podman) version 4.0.2 is present >>>>>> systemctl is present >>>>>> lvcreate is present >>>>>> Unit chronyd.service is enabled and running >>>>>> Host looks OK >>>>>> --------- cn04.ceph.--------- >>>>>> podman (/usr/bin/podman) version 4.0.2 is present >>>>>> systemctl is present >>>>>> lvcreate is present >>>>>> Unit chronyd.service is enabled and running >>>>>> Host looks OK >>>>>> --------- cn05.ceph.--------- >>>>>> podman|docker (/usr/bin/podman) is present >>>>>> systemctl is present >>>>>> lvcreate is present >>>>>> Unit chronyd.service is enabled and running >>>>>> Host looks OK >>>>>> --------- cn06.ceph.--------- >>>>>> podman (/usr/bin/podman) version 4.0.2 is present >>>>>> systemctl is present >>>>>> lvcreate is present >>>>>> Unit chronyd.service is enabled and running >>>>>> Host looks OK >>>>>> >>>>>> It seems to be recovering with what it has left, but a large amount >>>>>> of OSDs >>>>>> are down. When trying to restart one of the down'd OSDs, I see a >>>>>> huge dump. >>>>>> >>>>>> Jul 25 03:19:38 cn06.ceph >>>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug >>>>>> 2022-07-25T10:19:38.532+0000 7fce14a6c080 0 osd.34 30689 done with >>>>>> init, >>>>>> starting boot process >>>>>> Jul 25 03:19:38 cn06.ceph >>>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug >>>>>> 2022-07-25T10:19:38.532+0000 7fce14a6c080 1 osd.34 30689 start_boot >>>>>> Jul 25 03:20:10 cn06.ceph >>>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug >>>>>> 2022-07-25T10:20:10.655+0000 7fcdfd12d700 1 osd.34 30689 start_boot >>>>>> Jul 25 03:20:41 cn06.ceph >>>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug >>>>>> 2022-07-25T10:20:41.159+0000 7fcdfd12d700 1 osd.34 30689 start_boot >>>>>> Jul 25 03:21:11 cn06.ceph >>>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug >>>>>> 2022-07-25T10:21:11.662+0000 7fcdfd12d700 1 osd.34 30689 start_boot >>>>>> >>>>>> At this point it just keeps printing start_boot, but the dashboard >>>>>> has it >>>>>> marked as "in" but "down". >>>>>> >>>>>> On these three hosts that moved, there were a bunch marked as "out" >>>>>> and >>>>>> "down", and some with "in" but "down". >>>>>> >>>>>> Not sure where to go next. I'm going to let the recovery continue >>>>>> and hope >>>>>> that my 4x replication on these pools saves me. >>>>>> >>>>>> Not sure where to go from here. Any help is very much appreciated. >>>>>> This >>>>>> Ceph cluster holds all of our Cloudstack images... it would be >>>>>> terrible to >>>>>> lose this data. >>>>>> _______________________________________________ >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>> >>>>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx