OSDs (v172.3) won't start after Rocky Upgrade to Kernel 4.18.0-372.26.1.el8_6.x86_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello list member,

after upgrading from Octopus to Quincy yesterday, now we have a problem
starting OSDs on the newest Rocky 8.6 4.18.0-372.26.1.el8_6.x86_64.

This is a non-cephadm Cluster. All nodes running Rocky  with Kernel
4.18.0-372.19.1.el8_6.x86_64 except this one host (ceph1n012) i restarted
shortly

After rebooting the host to the new Kernel there were no OSD tmpfs mounts
visible under /var/lib/ceph/osd/ceph-xy

-- Unit ceph-osd@26.service has begun starting up.
Sep 29 15:33:21 ceph1n012 systemd[1]: Started Ceph object storage daemon
osd.26.
-- Subject: Unit ceph-osd@26.service has finished start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Unit ceph-osd@26.service has finished starting up.
-- The start-up result is done.
Sep 29 15:33:21 ceph1n012 ceph-osd[2258]: 2022-09-29T15:33:21.754+0200
7f49126663c0 -1 auth: unable to find a keyring on
/var/lib/ceph/osd/ceph-26/keyring: (2) No such file or directory
Sep 29 15:33:21 ceph1n012 ceph-osd[2258]: 2022-09-29T15:33:21.754+0200
7f49126663c0 -1 AuthRegistry(0x56100fdb4140) no keyring found at
/var/lib/ceph/osd/ceph-26/keyring, disabling cephx
Sep 29 15:33:21 ceph1n012 ceph-osd[2258]: 2022-09-29T15:33:21.755+0200
7f49126663c0 -1 auth: unable to find a keyring on
/var/lib/ceph/osd/ceph-26/keyring: (2) No such file or directory
Sep 29 15:33:21 ceph1n012 ceph-osd[2258]: 2022-09-29T15:33:21.755+0200
7f49126663c0 -1 AuthRegistry(0x7fff3eb59610) no keyring found at
/var/lib/ceph/osd/ceph-26/keyring, disabling cephx
Sep 29 15:33:21 ceph1n012 ceph-osd[2258]: failed to fetch mon config
(--no-mon-config to skip)
Sep 29 15:33:21 ceph1n012 systemd[1]: ceph-osd@26.service: Main process
exited, code=exited, status=1/FAILURE
Sep 29 15:33:21 ceph1n012 systemd[1]: ceph-osd@26.service: Failed with
result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- The unit ceph-osd@26.service has entered the 'failed' state with result
'exit-code'.

We also get these warning from systemd:

[root@ceph1n012 ~]# systemctl status  ceph-osd@24
\u25cf ceph-osd@24.service - Ceph object storage daemon osd.24
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; disabled;
vendor preset: disabled)
   Active: inactive (dead)

*Sep 29 08:43:39 ceph1n012 systemd[1]:
/usr/lib/systemd/system/ceph-osd@.service:23: Unknown lvalue
'ProtectHostname' in section 'Service'Sep 29 08:43:39 ceph1n012 systemd[1]:
/usr/lib/systemd/system/ceph-osd@.service:24: Unknown lvalue
'ProtectKernelLogs' in section 'Service*

It seems that the OSD process directly stops and there are no logfiles
written so far....

Does someone have an eplanation for this bevaviour?  What is the cause
missing tmpfs mounts an who mounts these? systemd, the OSD itself or..??

Many thanks for any hint helping to get missing 7 OSDs up ASAP.

Christoph



  cluster:
    id:     a9dd5dd5-f87c-4a42-a251-fa6d21934914
    health: HEALTH_WARN
            noout,norecover flag(s) set
            7 osds down
            2 hosts (7 osds) down
            Degraded data redundancy: 12808072/127922532 objects degraded
(10.012%), 650 pgs degraded, 762 pgs undersized

  services:
    mon: 5 daemons, quorum
ceph1n011,ceph1n021,ceph1n020,ceph1n019,ceph1n012 (age 73m)
    mgr: ceph1n019(active, since 26h), standbys: ceph1n020, ceph1n021
    mds: 8/8 daemons up, 3 standby
    osd: 79 osds: 72 up (since 78m), 79 in (since 7w)
         flags noout,norecover

{
    "mon": {
        "ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d)
quincy (stable)": 5
    },
    "mgr": {
        "ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d)
quincy (stable)": 3
    },
    "osd": {
        "ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d)
quincy (stable)": 72
    },
    "mds": {
        "ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d)
quincy (stable)": 3
    },
    "overall": {
        "ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d)
quincy (stable)": 83









Christoph Ackermann | System Engineer
INFOSERVE GmbH | Am Felsbrunnen 15 | D-66119 Saarbrücken
Fon +49 (0)681 88008-59 | Fax +49 (0)681 88008-33 | c.ackermann@xxxxxxxxxxxx
| www.infoserve.de
INFOSERVE Datenschutzhinweise: www.infoserve.de/datenschutz
Handelsregister: Amtsgericht Saarbrücken, HRB 11001 | Erfüllungsort:
Saarbrücken
Geschäftsführer: Dr. Stefan Leinenbach | Ust-IdNr.: DE168970599

<https://facebook.com/infoserve.de>
<https://www.xing.com/companies/infoservegmbh>
<https://www.youtube.com/channel/UCUj8C3TGGhQZPVvxu4woXmQ>
<https://www.linkedin.com/company-beta/10095540>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux