Re: Issues after a shutdown

Jeremy Hansen <farnsworth.mcfadden@xxxxxxxxx> · Mon, 25 Jul 2022 15:49:23 -0700

I use Ubiquiti equipment, mainly because I'm not a network admin...  I
rebooted the 10G switches and now everything is working and recovering.  I
hate when there's not a definitive answer but that's kind of the deal when
you use Ubiquiti stuff.  Thank you Sean and Frank.  Frank, you were right.
It made no sense because from a very basic point of view the network seemed
fine, but Sean's ping revealed that it clearly wasn't.

Thank you!
-jeremy

On Mon, Jul 25, 2022 at 3:08 PM Sean Redmond <sean.redmond1@xxxxxxxxx>
wrote:

> Yea, assuming you can ping with a lower MTU, check the MTU on your
> switching.
>
> On Mon, 25 Jul 2022, 23:05 Jeremy Hansen, <farnsworth.mcfadden@xxxxxxxxx>
> wrote:
>
>> That results in packet loss:
>>
>> [root@cn01 ~]# ping -M do -s 8972 192.168.30.14
>> PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data.
>> ^C
>> --- 192.168.30.14 ping statistics ---
>> 3 packets transmitted, 0 received, 100% packet loss, time 2062ms
>>
>> That's very weird...  but this gives me something to figure out.  Hmmm.
>> Thank you.
>>
>> On Mon, Jul 25, 2022 at 3:01 PM Sean Redmond <sean.redmond1@xxxxxxxxx>
>> wrote:
>>
>>> Looks good, just confirm it with a large ping with don't fragment flag
>>> set between each host.
>>>
>>> ping -M do -s 8972 [destination IP]
>>>
>>>
>>> On Mon, 25 Jul 2022, 22:56 Jeremy Hansen, <farnsworth.mcfadden@xxxxxxxxx>
>>> wrote:
>>>
>>>> MTU is the same across all hosts:
>>>>
>>>> --------- cn01.ceph.la1.clx.corp---------
>>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>>>         inet 192.168.30.11  netmask 255.255.255.0  broadcast
>>>> 192.168.30.255
>>>>         inet6 fe80::3e8c:f8ff:feed:728d  prefixlen 64  scopeid
>>>> 0x20<link>
>>>>         ether 3c:8c:f8:ed:72:8d  txqueuelen 1000  (Ethernet)
>>>>         RX packets 3163785  bytes 2136258888 (1.9 GiB)
>>>>         RX errors 0  dropped 0  overruns 0  frame 0
>>>>         TX packets 6890933  bytes 40233267272 (37.4 GiB)
>>>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>>
>>>> --------- cn02.ceph.la1.clx.corp---------
>>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>>>         inet 192.168.30.12  netmask 255.255.255.0  broadcast
>>>> 192.168.30.255
>>>>         inet6 fe80::3e8c:f8ff:feed:ff0c  prefixlen 64  scopeid
>>>> 0x20<link>
>>>>         ether 3c:8c:f8:ed:ff:0c  txqueuelen 1000  (Ethernet)
>>>>         RX packets 3976256  bytes 2761764486 (2.5 GiB)
>>>>         RX errors 0  dropped 0  overruns 0  frame 0
>>>>         TX packets 9270324  bytes 56984933585 (53.0 GiB)
>>>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>>
>>>> --------- cn03.ceph.la1.clx.corp---------
>>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>>>         inet 192.168.30.13  netmask 255.255.255.0  broadcast
>>>> 192.168.30.255
>>>>         inet6 fe80::3e8c:f8ff:feed:feba  prefixlen 64  scopeid
>>>> 0x20<link>
>>>>         ether 3c:8c:f8:ed:fe:ba  txqueuelen 1000  (Ethernet)
>>>>         RX packets 13081847  bytes 93614795356 (87.1 GiB)
>>>>         RX errors 0  dropped 0  overruns 0  frame 0
>>>>         TX packets 4001854  bytes 2536322435 (2.3 GiB)
>>>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>>
>>>> --------- cn04.ceph.la1.clx.corp---------
>>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>>>         inet 192.168.30.14  netmask 255.255.255.0  broadcast
>>>> 192.168.30.255
>>>>         inet6 fe80::3e8c:f8ff:feed:6f89  prefixlen 64  scopeid
>>>> 0x20<link>
>>>>         ether 3c:8c:f8:ed:6f:89  txqueuelen 1000  (Ethernet)
>>>>         RX packets 60018  bytes 5622542 (5.3 MiB)
>>>>         RX errors 0  dropped 0  overruns 0  frame 0
>>>>         TX packets 59889  bytes 17463794 (16.6 MiB)
>>>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>>
>>>> --------- cn05.ceph.la1.clx.corp---------
>>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>>>         inet 192.168.30.15  netmask 255.255.255.0  broadcast
>>>> 192.168.30.255
>>>>         inet6 fe80::3e8c:f8ff:feed:7245  prefixlen 64  scopeid
>>>> 0x20<link>
>>>>         ether 3c:8c:f8:ed:72:45  txqueuelen 1000  (Ethernet)
>>>>         RX packets 69163  bytes 8085511 (7.7 MiB)
>>>>         RX errors 0  dropped 0  overruns 0  frame 0
>>>>         TX packets 73539  bytes 17069869 (16.2 MiB)
>>>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>>
>>>> --------- cn06.ceph.la1.clx.corp---------
>>>> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>>>         inet 192.168.30.16  netmask 255.255.255.0  broadcast
>>>> 192.168.30.255
>>>>         inet6 fe80::3e8c:f8ff:feed:feab  prefixlen 64  scopeid
>>>> 0x20<link>
>>>>         ether 3c:8c:f8:ed:fe:ab  txqueuelen 1000  (Ethernet)
>>>>         RX packets 23570  bytes 2251531 (2.1 MiB)
>>>>         RX errors 0  dropped 0  overruns 0  frame 0
>>>>         TX packets 22268  bytes 16186794 (15.4 MiB)
>>>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>>
>>>> 10G.
>>>>
>>>> On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond <sean.redmond1@xxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Is the MTU in n the new rack set correctly?
>>>>>
>>>>> On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, <
>>>>> farnsworth.mcfadden@xxxxxxxxx> wrote:
>>>>>
>>>>>> I transitioned some servers to a new rack and now I'm having major
>>>>>> issues
>>>>>> with Ceph upon bringing things back up.
>>>>>>
>>>>>> I believe the issue may be related to the ceph nodes coming back up
>>>>>> with
>>>>>> different IPs before VLANs were set.  That's just a guess because I
>>>>>> can't
>>>>>> think of any other reason this would happen.
>>>>>>
>>>>>> Current state:
>>>>>>
>>>>>> Every 2.0s: ceph -s
>>>>>>    cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022
>>>>>>
>>>>>>   cluster:
>>>>>>     id:     bfa2ad58-c049-11eb-9098-3c8cf8ed728d
>>>>>>     health: HEALTH_WARN
>>>>>>             1 filesystem is degraded
>>>>>>             2 MDSs report slow metadata IOs
>>>>>>             2/5 mons down, quorum cn02,cn03,cn01
>>>>>>             9 osds down
>>>>>>             3 hosts (17 osds) down
>>>>>>             Reduced data availability: 97 pgs inactive, 9 pgs down
>>>>>>             Degraded data redundancy: 13860144/30824413 objects
>>>>>> degraded
>>>>>> (44.965%), 411 pgs degraded, 482 pgs undersized
>>>>>>
>>>>>>   services:
>>>>>>     mon: 5 daemons, quorum cn02,cn03,cn01 (age 62m), out of quorum:
>>>>>> cn05,
>>>>>> cn04
>>>>>>     mgr: cn02.arszct(active, since 5m)
>>>>>>     mds: 2/2 daemons up, 2 standby
>>>>>>     osd: 35 osds: 15 up (since 62m), 24 in (since 58m); 222 remapped
>>>>>> pgs
>>>>>>
>>>>>>   data:
>>>>>>     volumes: 1/2 healthy, 1 recovering
>>>>>>     pools:   8 pools, 545 pgs
>>>>>>     objects: 7.71M objects, 6.7 TiB
>>>>>>     usage:   15 TiB used, 39 TiB / 54 TiB avail
>>>>>>     pgs:     0.367% pgs unknown
>>>>>>              17.431% pgs not active
>>>>>>              13860144/30824413 objects degraded (44.965%)
>>>>>>              1137693/30824413 objects misplaced (3.691%)
>>>>>>              280 active+undersized+degraded
>>>>>>              67  undersized+degraded+remapped+backfilling+peered
>>>>>>              57  active+undersized+remapped
>>>>>>              45  active+clean+remapped
>>>>>>              44  active+undersized+degraded+remapped+backfilling
>>>>>>              18  undersized+degraded+peered
>>>>>>              10  active+undersized
>>>>>>              9   down
>>>>>>              7   active+clean
>>>>>>              3   active+undersized+remapped+backfilling
>>>>>>              2   active+undersized+degraded+remapped+backfill_wait
>>>>>>              2   unknown
>>>>>>              1   undersized+peered
>>>>>>
>>>>>>   io:
>>>>>>     client:   170 B/s rd, 0 op/s rd, 0 op/s wr
>>>>>>     recovery: 168 MiB/s, 158 keys/s, 166 objects/s
>>>>>>
>>>>>> I have to disable and re-enable the dashboard just to use it.  It
>>>>>> seems to
>>>>>> get bogged down after a few moments.
>>>>>>
>>>>>> The three servers that were moved to the new rack Ceph has marked as
>>>>>> "Down", but if I do a cephadm host-check, they all seem to pass:
>>>>>>
>>>>>> ************************ ceph  ************************
>>>>>> --------- cn01.ceph.---------
>>>>>> podman (/usr/bin/podman) version 4.0.2 is present
>>>>>> systemctl is present
>>>>>> lvcreate is present
>>>>>> Unit chronyd.service is enabled and running
>>>>>> Host looks OK
>>>>>> --------- cn02.ceph.---------
>>>>>> podman (/usr/bin/podman) version 4.0.2 is present
>>>>>> systemctl is present
>>>>>> lvcreate is present
>>>>>> Unit chronyd.service is enabled and running
>>>>>> Host looks OK
>>>>>> --------- cn03.ceph.---------
>>>>>> podman (/usr/bin/podman) version 4.0.2 is present
>>>>>> systemctl is present
>>>>>> lvcreate is present
>>>>>> Unit chronyd.service is enabled and running
>>>>>> Host looks OK
>>>>>> --------- cn04.ceph.---------
>>>>>> podman (/usr/bin/podman) version 4.0.2 is present
>>>>>> systemctl is present
>>>>>> lvcreate is present
>>>>>> Unit chronyd.service is enabled and running
>>>>>> Host looks OK
>>>>>> --------- cn05.ceph.---------
>>>>>> podman|docker (/usr/bin/podman) is present
>>>>>> systemctl is present
>>>>>> lvcreate is present
>>>>>> Unit chronyd.service is enabled and running
>>>>>> Host looks OK
>>>>>> --------- cn06.ceph.---------
>>>>>> podman (/usr/bin/podman) version 4.0.2 is present
>>>>>> systemctl is present
>>>>>> lvcreate is present
>>>>>> Unit chronyd.service is enabled and running
>>>>>> Host looks OK
>>>>>>
>>>>>> It seems to be recovering with what it has left, but a large amount
>>>>>> of OSDs
>>>>>> are down.  When trying to restart one of the down'd OSDs, I see a
>>>>>> huge dump.
>>>>>>
>>>>>> Jul 25 03:19:38 cn06.ceph
>>>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>>>>> 2022-07-25T10:19:38.532+0000 7fce14a6c080  0 osd.34 30689 done with
>>>>>> init,
>>>>>> starting boot process
>>>>>> Jul 25 03:19:38 cn06.ceph
>>>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>>>>> 2022-07-25T10:19:38.532+0000 7fce14a6c080  1 osd.34 30689 start_boot
>>>>>> Jul 25 03:20:10 cn06.ceph
>>>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>>>>> 2022-07-25T10:20:10.655+0000 7fcdfd12d700  1 osd.34 30689 start_boot
>>>>>> Jul 25 03:20:41 cn06.ceph
>>>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>>>>> 2022-07-25T10:20:41.159+0000 7fcdfd12d700  1 osd.34 30689 start_boot
>>>>>> Jul 25 03:21:11 cn06.ceph
>>>>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>>>>> 2022-07-25T10:21:11.662+0000 7fcdfd12d700  1 osd.34 30689 start_boot
>>>>>>
>>>>>> At this point it just keeps printing start_boot, but the dashboard
>>>>>> has it
>>>>>> marked as "in" but "down".
>>>>>>
>>>>>> On these three hosts that moved, there were a bunch marked as "out"
>>>>>> and
>>>>>> "down", and some with "in" but "down".
>>>>>>
>>>>>> Not sure where to go next.  I'm going to let the recovery continue
>>>>>> and hope
>>>>>> that my 4x replication on these pools saves me.
>>>>>>
>>>>>> Not sure where to go from here.  Any help is very much appreciated.
>>>>>> This
>>>>>> Ceph cluster holds all of our Cloudstack images...  it would be
>>>>>> terrible to
>>>>>> lose this data.
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>>
>>>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx