Re: Issues after a shutdown

Jeremy Hansen <farnsworth.mcfadden@xxxxxxxxx> · Mon, 25 Jul 2022 15:02:58 -0700

Does ceph do any kind of io fencing if it notices an anomaly?  Do I need to
do something to re-enable these hosts if they get marked as bad?

On Mon, Jul 25, 2022 at 2:56 PM Jeremy Hansen <farnsworth.mcfadden@xxxxxxxxx>
wrote:

> MTU is the same across all hosts:
>
> --------- cn01.ceph.la1.clx.corp---------
> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>         inet 192.168.30.11  netmask 255.255.255.0  broadcast 192.168.30.255
>         inet6 fe80::3e8c:f8ff:feed:728d  prefixlen 64  scopeid 0x20<link>
>         ether 3c:8c:f8:ed:72:8d  txqueuelen 1000  (Ethernet)
>         RX packets 3163785  bytes 2136258888 (1.9 GiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 6890933  bytes 40233267272 (37.4 GiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> --------- cn02.ceph.la1.clx.corp---------
> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>         inet 192.168.30.12  netmask 255.255.255.0  broadcast 192.168.30.255
>         inet6 fe80::3e8c:f8ff:feed:ff0c  prefixlen 64  scopeid 0x20<link>
>         ether 3c:8c:f8:ed:ff:0c  txqueuelen 1000  (Ethernet)
>         RX packets 3976256  bytes 2761764486 (2.5 GiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 9270324  bytes 56984933585 (53.0 GiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> --------- cn03.ceph.la1.clx.corp---------
> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>         inet 192.168.30.13  netmask 255.255.255.0  broadcast 192.168.30.255
>         inet6 fe80::3e8c:f8ff:feed:feba  prefixlen 64  scopeid 0x20<link>
>         ether 3c:8c:f8:ed:fe:ba  txqueuelen 1000  (Ethernet)
>         RX packets 13081847  bytes 93614795356 (87.1 GiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 4001854  bytes 2536322435 (2.3 GiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> --------- cn04.ceph.la1.clx.corp---------
> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>         inet 192.168.30.14  netmask 255.255.255.0  broadcast 192.168.30.255
>         inet6 fe80::3e8c:f8ff:feed:6f89  prefixlen 64  scopeid 0x20<link>
>         ether 3c:8c:f8:ed:6f:89  txqueuelen 1000  (Ethernet)
>         RX packets 60018  bytes 5622542 (5.3 MiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 59889  bytes 17463794 (16.6 MiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> --------- cn05.ceph.la1.clx.corp---------
> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>         inet 192.168.30.15  netmask 255.255.255.0  broadcast 192.168.30.255
>         inet6 fe80::3e8c:f8ff:feed:7245  prefixlen 64  scopeid 0x20<link>
>         ether 3c:8c:f8:ed:72:45  txqueuelen 1000  (Ethernet)
>         RX packets 69163  bytes 8085511 (7.7 MiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 73539  bytes 17069869 (16.2 MiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> --------- cn06.ceph.la1.clx.corp---------
> enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>         inet 192.168.30.16  netmask 255.255.255.0  broadcast 192.168.30.255
>         inet6 fe80::3e8c:f8ff:feed:feab  prefixlen 64  scopeid 0x20<link>
>         ether 3c:8c:f8:ed:fe:ab  txqueuelen 1000  (Ethernet)
>         RX packets 23570  bytes 2251531 (2.1 MiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 22268  bytes 16186794 (15.4 MiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> 10G.
>
> On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond <sean.redmond1@xxxxxxxxx>
> wrote:
>
>> Is the MTU in n the new rack set correctly?
>>
>> On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, <farnsworth.mcfadden@xxxxxxxxx>
>> wrote:
>>
>>> I transitioned some servers to a new rack and now I'm having major issues
>>> with Ceph upon bringing things back up.
>>>
>>> I believe the issue may be related to the ceph nodes coming back up with
>>> different IPs before VLANs were set.  That's just a guess because I can't
>>> think of any other reason this would happen.
>>>
>>> Current state:
>>>
>>> Every 2.0s: ceph -s
>>>    cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022
>>>
>>>   cluster:
>>>     id:     bfa2ad58-c049-11eb-9098-3c8cf8ed728d
>>>     health: HEALTH_WARN
>>>             1 filesystem is degraded
>>>             2 MDSs report slow metadata IOs
>>>             2/5 mons down, quorum cn02,cn03,cn01
>>>             9 osds down
>>>             3 hosts (17 osds) down
>>>             Reduced data availability: 97 pgs inactive, 9 pgs down
>>>             Degraded data redundancy: 13860144/30824413 objects degraded
>>> (44.965%), 411 pgs degraded, 482 pgs undersized
>>>
>>>   services:
>>>     mon: 5 daemons, quorum cn02,cn03,cn01 (age 62m), out of quorum: cn05,
>>> cn04
>>>     mgr: cn02.arszct(active, since 5m)
>>>     mds: 2/2 daemons up, 2 standby
>>>     osd: 35 osds: 15 up (since 62m), 24 in (since 58m); 222 remapped pgs
>>>
>>>   data:
>>>     volumes: 1/2 healthy, 1 recovering
>>>     pools:   8 pools, 545 pgs
>>>     objects: 7.71M objects, 6.7 TiB
>>>     usage:   15 TiB used, 39 TiB / 54 TiB avail
>>>     pgs:     0.367% pgs unknown
>>>              17.431% pgs not active
>>>              13860144/30824413 objects degraded (44.965%)
>>>              1137693/30824413 objects misplaced (3.691%)
>>>              280 active+undersized+degraded
>>>              67  undersized+degraded+remapped+backfilling+peered
>>>              57  active+undersized+remapped
>>>              45  active+clean+remapped
>>>              44  active+undersized+degraded+remapped+backfilling
>>>              18  undersized+degraded+peered
>>>              10  active+undersized
>>>              9   down
>>>              7   active+clean
>>>              3   active+undersized+remapped+backfilling
>>>              2   active+undersized+degraded+remapped+backfill_wait
>>>              2   unknown
>>>              1   undersized+peered
>>>
>>>   io:
>>>     client:   170 B/s rd, 0 op/s rd, 0 op/s wr
>>>     recovery: 168 MiB/s, 158 keys/s, 166 objects/s
>>>
>>> I have to disable and re-enable the dashboard just to use it.  It seems
>>> to
>>> get bogged down after a few moments.
>>>
>>> The three servers that were moved to the new rack Ceph has marked as
>>> "Down", but if I do a cephadm host-check, they all seem to pass:
>>>
>>> ************************ ceph  ************************
>>> --------- cn01.ceph.---------
>>> podman (/usr/bin/podman) version 4.0.2 is present
>>> systemctl is present
>>> lvcreate is present
>>> Unit chronyd.service is enabled and running
>>> Host looks OK
>>> --------- cn02.ceph.---------
>>> podman (/usr/bin/podman) version 4.0.2 is present
>>> systemctl is present
>>> lvcreate is present
>>> Unit chronyd.service is enabled and running
>>> Host looks OK
>>> --------- cn03.ceph.---------
>>> podman (/usr/bin/podman) version 4.0.2 is present
>>> systemctl is present
>>> lvcreate is present
>>> Unit chronyd.service is enabled and running
>>> Host looks OK
>>> --------- cn04.ceph.---------
>>> podman (/usr/bin/podman) version 4.0.2 is present
>>> systemctl is present
>>> lvcreate is present
>>> Unit chronyd.service is enabled and running
>>> Host looks OK
>>> --------- cn05.ceph.---------
>>> podman|docker (/usr/bin/podman) is present
>>> systemctl is present
>>> lvcreate is present
>>> Unit chronyd.service is enabled and running
>>> Host looks OK
>>> --------- cn06.ceph.---------
>>> podman (/usr/bin/podman) version 4.0.2 is present
>>> systemctl is present
>>> lvcreate is present
>>> Unit chronyd.service is enabled and running
>>> Host looks OK
>>>
>>> It seems to be recovering with what it has left, but a large amount of
>>> OSDs
>>> are down.  When trying to restart one of the down'd OSDs, I see a huge
>>> dump.
>>>
>>> Jul 25 03:19:38 cn06.ceph
>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>> 2022-07-25T10:19:38.532+0000 7fce14a6c080  0 osd.34 30689 done with init,
>>> starting boot process
>>> Jul 25 03:19:38 cn06.ceph
>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>> 2022-07-25T10:19:38.532+0000 7fce14a6c080  1 osd.34 30689 start_boot
>>> Jul 25 03:20:10 cn06.ceph
>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>> 2022-07-25T10:20:10.655+0000 7fcdfd12d700  1 osd.34 30689 start_boot
>>> Jul 25 03:20:41 cn06.ceph
>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>> 2022-07-25T10:20:41.159+0000 7fcdfd12d700  1 osd.34 30689 start_boot
>>> Jul 25 03:21:11 cn06.ceph
>>> ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
>>> 2022-07-25T10:21:11.662+0000 7fcdfd12d700  1 osd.34 30689 start_boot
>>>
>>> At this point it just keeps printing start_boot, but the dashboard has it
>>> marked as "in" but "down".
>>>
>>> On these three hosts that moved, there were a bunch marked as "out" and
>>> "down", and some with "in" but "down".
>>>
>>> Not sure where to go next.  I'm going to let the recovery continue and
>>> hope
>>> that my 4x replication on these pools saves me.
>>>
>>> Not sure where to go from here.  Any help is very much appreciated.  This
>>> Ceph cluster holds all of our Cloudstack images...  it would be terrible
>>> to
>>> lose this data.
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx