Re: Issues after a shutdown

Jeremy Hansen <farnsworth.mcfadden@xxxxxxxxx> · Mon, 25 Jul 2022 15:05:41 -0700

That results in packet loss:

[root@cn01 ~]# ping -M do -s 8972 192.168.30.14
PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data.
^C
--- 192.168.30.14 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2062ms

That's very weird...  but this gives me something to figure out.  Hmmm.
Thank you.

On Mon, Jul 25, 2022 at 3:01 PM Sean Redmond <sean.redmond1@xxxxxxxxx>
wrote:

> Looks good, just confirm it with a large ping with don't fragment flag set
> between each host.
>
> ping -M do -s 8972 [destination IP]
>
>
> On Mon, 25 Jul 2022, 22:56 Jeremy Hansen, <farnsworth.mcfadden@xxxxxxxxx>
> wrote:
>
>> MTU is the same across all hosts:
>>
>> --------- cn01.ceph.la1.clx.corp---------
>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>         inet 192.168.30.11  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>>         inet6 fe80::3e8c:f8ff:feed:728d  prefixlen 64  scopeid 0x20<link>
>>         ether 3c:8c:f8:ed:72:8d  txqueuelen 1000  (Ethernet)
>>         RX packets 3163785  bytes 2136258888 (1.9 GiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 6890933  bytes 40233267272 (37.4 GiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> --------- cn02.ceph.la1.clx.corp---------
>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>         inet 192.168.30.12  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>>         inet6 fe80::3e8c:f8ff:feed:ff0c  prefixlen 64  scopeid 0x20<link>
>>         ether 3c:8c:f8:ed:ff:0c  txqueuelen 1000  (Ethernet)
>>         RX packets 3976256  bytes 2761764486 (2.5 GiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 9270324  bytes 56984933585 (53.0 GiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> --------- cn03.ceph.la1.clx.corp---------
>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>         inet 192.168.30.13  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>>         inet6 fe80::3e8c:f8ff:feed:feba  prefixlen 64  scopeid 0x20<link>
>>         ether 3c:8c:f8:ed:fe:ba  txqueuelen 1000  (Ethernet)
>>         RX packets 13081847  bytes 93614795356 (87.1 GiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 4001854  bytes 2536322435 (2.3 GiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> --------- cn04.ceph.la1.clx.corp---------
>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>         inet 192.168.30.14  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>>         inet6 fe80::3e8c:f8ff:feed:6f89  prefixlen 64  scopeid 0x20<link>
>>         ether 3c:8c:f8:ed:6f:89  txqueuelen 1000  (Ethernet)
>>         RX packets 60018  bytes 5622542 (5.3 MiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 59889  bytes 17463794 (16.6 MiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> --------- cn05.ceph.la1.clx.corp---------
>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>         inet 192.168.30.15  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>>         inet6 fe80::3e8c:f8ff:feed:7245  prefixlen 64  scopeid 0x20<link>
>>         ether 3c:8c:f8:ed:72:45  txqueuelen 1000  (Ethernet)
>>         RX packets 69163  bytes 8085511 (7.7 MiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 73539  bytes 17069869 (16.2 MiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> --------- cn06.ceph.la1.clx.corp---------
>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>         inet 192.168.30.16  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>>         inet6 fe80::3e8c:f8ff:feed:feab  prefixlen 64  scopeid 0x20<link>
>>         ether 3c:8c:f8:ed:fe:ab  txqueuelen 1000  (Ethernet)
>>         RX packets 23570  bytes 2251531 (2.1 MiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 22268  bytes 16186794 (15.4 MiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> 10G.
>>
>> On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond <sean.redmond1@xxxxxxxxx>
>> wrote:
>>
>>> Is the MTU in n the new rack set correctly?
>>>
>>> On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, <farnsworth.mcfadden@xxxxxxxxx>
>>> wrote:
>>>
>>>> I transitioned some servers to a new rack and now I'm having major
>>>> issues
>>>> with Ceph upon bringing things back up.
>>>>
>>>> I believe the issue may be related to the ceph nodes coming back up with
>>>> different IPs before VLANs were set.  That's just a guess because I
>>>> can't
>>>> think of any other reason this would happen.
>>>>
>>>> Current state:
>>>>
>>>> Every 2.0s: ceph -s
>>>>    cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022
>>>>
>>>>   cluster:
>>>>     id:     bfa2ad58-c049-11eb-9098-3c8cf8ed728d
>>>>     health: HEALTH_WARN
>>>>             1 filesystem is degraded
>>>>             2 MDSs report slow metadata IOs
>>>>             2/5 mons down, quorum cn02,cn03,cn01
>>>>             9 osds down
>>>>             3 hosts (17 osds) down
>>>>             Reduced data availability: 97 pgs inactive, 9 pgs down
>>>>             Degraded data redundancy: 13860144/30824413 objects degraded
>>>> (44.965%), 411 pgs degraded, 482 pgs undersized
>>>>
>>>>   services:
>>>>     mon: 5 daemons, quorum cn02,cn03,cn01 (age 62m), out of quorum:
>>>> cn05,
>>>> cn04
>>>>     mgr: cn02.arszct(active, since 5m)
>>>>     mds: 2/2 daemons up, 2 standby
>>>>     osd: 35 osds: 15 up (since 62m), 24 in (since 58m); 222 remapped pgs
>>>>
>>>>   data:
>>>>     volumes: 1/2 healthy, 1 recovering
>>>>     pools:   8 pools, 545 pgs
>>>>     objects: 7.71M objects, 6.7 TiB
>>>>     usage:   15 TiB used, 39 TiB / 54 TiB avail
>>>>     pgs:     0.367% pgs unknown
>>>>              17.431% pgs not active
>>>>              13860144/30824413 objects degraded (44.965%)
>>>>              1137693/30824413 objects misplaced (3.691%)
>>>>              280 active+undersized+degraded
>>>>              67  undersized+degraded+remapped+backfilling+peered
>>>>              57  active+undersized+remapped
>>>>              45  active+clean+remapped
>>>>              44  active+undersized+degraded+remapped+backfilling
>>>>              18  undersized+degraded+peered
>>>>              10  active+undersized
>>>>              9   down
>>>>              7   active+clean
>>>>              3   active+undersized+remapped+backfilling
>>>>              2   active+undersized+degraded+remapped+backfill_wait
>>>>              2   unknown
>>>>              1   undersized+peered
>>>>
>>>>   io:
>>>>     client:   170 B/s rd, 0 op/s rd, 0 op/s wr
>>>>     recovery: 168 MiB/s, 158 keys/s, 166 objects/s
>>>>
>>>> I have to disable and re-enable the dashboard just to use it.  It seems
>>>> to
>>>> get bogged down after a few moments.
>>>>
>>>> The three servers that were moved to the new rack Ceph has marked as
>>>> "Down", but if I do a cephadm host-check, they all seem to pass:
>>>>
>>>> ************************ ceph  ************************
>>>> --------- cn01.ceph.---------
>>>> podman (/usr/bin/podman) version 4.0.2 is present
>>>> systemctl is present
>>>> lvcreate is present
>>>> Unit chronyd.service is enabled and running
>>>> Host looks OK
>>>> --------- cn02.ceph.---------
>>>> podman (/usr/bin/podman) version 4.0.2 is present
>>>> systemctl is present
>>>> lvcreate is present
>>>> Unit chronyd.service is enabled and running
>>>> Host looks OK
>>>> --------- cn03.ceph.---------
>>>> podman (/usr/bin/podman) version 4.0.2 is present
>>>> systemctl is present
>>>> lvcreate is present
>>>> Unit chronyd.service is enabled and running
>>>> Host looks OK
>>>> --------- cn04.ceph.---------
>>>> podman (/usr/bin/podman) version 4.0.2 is present
>>>> systemctl is present
>>>> lvcreate is present
>>>> Unit chronyd.service is enabled and running
>>>> Host looks OK
>>>> --------- cn05.ceph.---------
>>>> podman|docker (/usr/bin/podman) is present
>>>> systemctl is present
>>>> lvcreate is present
>>>> Unit chronyd.service is enabled and running
>>>> Host looks OK
>>>> --------- cn06.ceph.---------
>>>> podman (/usr/bin/podman) version 4.0.2 is present
>>>> systemctl is present
>>>> lvcreate is present
>>>> Unit chronyd.service is enabled and running
>>>> Host looks OK
>>>>
>>>> It seems to be recovering with what it has left, but a large amount of
>>>> OSDs
>>>> are down.  When trying to restart one of the down'd OSDs, I see a huge
>>>> dump.
>>>>
>>>> Jul 25 03:19:38 cn06.ceph
>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>>> 2022-07-25T10:19:38.532+0000 7fce14a6c080  0 osd.34 30689 done with
>>>> init,
>>>> starting boot process
>>>> Jul 25 03:19:38 cn06.ceph
>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>>> 2022-07-25T10:19:38.532+0000 7fce14a6c080  1 osd.34 30689 start_boot
>>>> Jul 25 03:20:10 cn06.ceph
>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>>> 2022-07-25T10:20:10.655+0000 7fcdfd12d700  1 osd.34 30689 start_boot
>>>> Jul 25 03:20:41 cn06.ceph
>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>>> 2022-07-25T10:20:41.159+0000 7fcdfd12d700  1 osd.34 30689 start_boot
>>>> Jul 25 03:21:11 cn06.ceph
>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>>> 2022-07-25T10:21:11.662+0000 7fcdfd12d700  1 osd.34 30689 start_boot
>>>>
>>>> At this point it just keeps printing start_boot, but the dashboard has
>>>> it
>>>> marked as "in" but "down".
>>>>
>>>> On these three hosts that moved, there were a bunch marked as "out" and
>>>> "down", and some with "in" but "down".
>>>>
>>>> Not sure where to go next.  I'm going to let the recovery continue and
>>>> hope
>>>> that my 4x replication on these pools saves me.
>>>>
>>>> Not sure where to go from here.  Any help is very much appreciated.
>>>> This
>>>> Ceph cluster holds all of our Cloudstack images...  it would be
>>>> terrible to
>>>> lose this data.
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>
>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx