Re: OSD down, how to reconstruct it from its main and block.db parts ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Wladimir, according to the logs you first sent it seems that there is an
authentication issue (the osd daemon not being able to fetch the mon config):

> жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
> 7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at
> /var/lib/ceph/osd/ceph-1/keyring, disabling cephx
> жов 23 16:59:36 p10s ceph-osd[3987]: failed to fetch mon config
> (--no-mon-config to skip)
> жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Main process
> exited, code=exited, status=1/FAILURE


The file it fails to load the keyring from is where the auth details for the
osd daemon should be in.
Some more info here:
  https://docs.ceph.com/en/latest/man/8/ceph-authtool/
  https://docs.ceph.com/en/latest/rados/configuration/auth-config-ref/
  https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/
  (specifically step 5)

I'm not sure if you were able to fix it or not, but I'd start trying to get
that fixed before playing with ceph-volume.


On 10/27 10:24, Wladimir Mutel wrote:
> Dear David,
> 
> I assimilated most of my Ceph configuration into the cluster itself as this feature was announced by Mimic.
> I see some fsid in [global] section of /etc/ceph/ceph.conf , and some key in [client.admin] section of /etc/ceph/ceph.client.admin.keyring
> The rest is pretty uninteresting, some minimal adjustments in config file and cluster's config dump.
> 
> Looking into Python scripts of ceph-volume, I noticed that tmpfs is mounted during the run "ceph-colume lvm activate",
> and "ceph-bluestore-tool prime-osd-dir" is started from the same script afterwards.
> Should I try starting "ceph-volume lvm activate" in some manual way to see where it stumbles and why ?
> 
> David Caro wrote:
> > Hi Wladim,
> > 
> > If the "unable to find keyring" message disappeared, what was the error after that fix?
> > 
> > If it's still failing to fetch the mon config, check your authentication (you might have to add the osd key to the keyring again), and/or that the mons ips are correct in your osd ceph.conf file.
> > 
> > On 23 October 2020 16:08:02 CEST, Wladimir Mutel <mwg@xxxxxxxxx> wrote:
> > > Dear all,
> > > 
> > > after breaking my experimental 1-host Ceph cluster and making one its
> > > pg 'incomplete' I left it in abandoned state for some time.
> > > Now I decided to bring it back into life and found that it can not
> > > start one of its OSDs (osd.1 to name it)
> > > 
> > > "ceph osd df" shows :
> > > 
> > > ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META
> > > AVAIL    %USE   VAR   PGS  STATUS
> > > 0    hdd        0   1.00000  2.7 TiB  1.6 TiB  1.6 TiB  113 MiB  4.7
> > > GiB  1.1 TiB  59.77  0.69  102      up
> > > 1    hdd  2.84549         0      0 B      0 B      0 B      0 B      0
> > > B      0 B      0     0    0    down
> > > 2    hdd  2.84549   1.00000  2.8 TiB  2.6 TiB  2.5 TiB   57 MiB  3.8
> > > GiB  275 GiB  90.58  1.05  176      up
> > > 3    hdd  2.84549   1.00000  2.8 TiB  2.6 TiB  2.5 TiB   57 MiB  3.9
> > > GiB  271 GiB  90.69  1.05  185      up
> > > 4    hdd  2.84549   1.00000  2.8 TiB  2.6 TiB  2.5 TiB   63 MiB  4.2
> > > GiB  263 GiB  90.98  1.05  184      up
> > > 5    hdd  2.84549   1.00000  2.8 TiB  2.6 TiB  2.5 TiB   52 MiB  3.8
> > > GiB  263 GiB  90.96  1.05  178      up
> > > 6    hdd  2.53400   1.00000  2.5 TiB  2.3 TiB  2.3 TiB  173 MiB  5.2
> > > GiB  228 GiB  91.21  1.05  178      up
> > > 7    hdd  2.53400   1.00000  2.5 TiB  2.3 TiB  2.3 TiB  147 MiB  5.2
> > > GiB  230 GiB  91.12  1.05  168      up
> > >      TOTAL   19 TiB   17 TiB   16 TiB  662 MiB   31 GiB  2.6 TiB  86.48
> > > MIN/MAX VAR: 0.69/1.05  STDDEV: 10.90
> > > 
> > > "ceph device ls" shows :
> > > 
> > > DEVICE                                      HOST:DEV      DAEMONS
> > >                  LIFE EXPECTANCY
> > > GIGABYTE_GP-ASACNE2100TTTDR_SN191108950380  p10s:nvme0n1  osd.1 osd.2
> > > osd.3 osd.4 osd.5
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K1JJXVST        p10s:sdd      osd.1
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K1VUYPRA        p10s:sda      osd.6
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K2CKX8NT        p10s:sdb      osd.7
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K2UD8H74        p10s:sde      osd.2
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K2VFTR1F        p10s:sdh      osd.5
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K3CYKL87        p10s:sdf      osd.3
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K6FPZAJP        p10s:sdc      osd.0
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K7FXSCRN        p10s:sdg      osd.4
> > > 
> > > In my last migration, I created a bluestore volume with external
> > > block.db like this :
> > > 
> > > "ceph-volume lvm prepare --bluestore --data /dev/sdd1 --block.db
> > > /dev/nvme0n1p4"
> > > 
> > > And I can see this metadata by
> > > 
> > > "ceph-bluestore-tool show-label --dev
> > > /dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202"
> > > :
> > > 
> > > {
> > > "/dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202":
> > > {
> > >          "osd_uuid": "8c6324a3-0364-4fad-9dcb-81a1661ee202",
> > >          "size": 3000588304384,
> > >          "btime": "2020-07-12T11:34:16.579735+0300",
> > >          "description": "main",
> > >          "bfm_blocks": "45785344",
> > >          "bfm_blocks_per_key": "128",
> > >          "bfm_bytes_per_block": "65536",
> > >          "bfm_size": "3000588304384",
> > >          "bluefs": "1",
> > >          "ceph_fsid": "49cdfe90-6f6e-4afe-8558-bf14a13aadfa",
> > >          "kv_backend": "rocksdb",
> > >          "magic": "ceph osd volume v026",
> > >          "mkfs_done": "yes",
> > >          "osd_key": "AQD9ygpf+7+MABAAqtj4y1YYgxwCaAN/jgDSwg==",
> > >          "ready": "ready",
> > >          "require_osd_release": "14",
> > >          "whoami": "1"
> > >      }
> > > }
> > > 
> > > and by
> > > 
> > > "ceph-bluestore-tool show-label --dev /dev/nvme0n1p4" :
> > > 
> > > {
> > >      "/dev/nvme0n1p4": {
> > >          "osd_uuid": "8c6324a3-0364-4fad-9dcb-81a1661ee202",
> > >          "size": 128025886720,
> > >          "btime": "2020-07-12T11:34:16.592054+0300",
> > >          "description": "bluefs db"
> > >      }
> > > }
> > > 
> > > As you see, their osd_uuid is equal.
> > > But when I try to start it by hand : "systemctl restart ceph-osd@1" ,
> > > I get this in the logs : ("journalctl -b -u ceph-osd@1")
> > > 
> > > -- Logs begin at Tue 2020-10-13 19:09:49 EEST, end at Fri 2020-10-23
> > > 16:59:38 EEST. --
> > > жов 23 16:59:36 p10s systemd[1]: Starting Ceph object storage daemon
> > > osd.1...
> > > жов 23 16:59:36 p10s systemd[1]: Started Ceph object storage daemon
> > > osd.1.
> > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
> > > 7f513cebedc0 -1 auth: unable to find a keyring on
> > > /var/lib/ceph/osd/ceph-1/keyring: (2) No
> > > such file or directory
> > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
> > > 7f513cebedc0 -1 auth: unable to find a keyring on
> > > /var/lib/ceph/osd/ceph-1/keyring: (2) No
> > > such file or directory
> > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
> > > 7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at
> > > /var/lib/ceph/osd/ceph-1/keyring, disabling cephx
> > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
> > > 7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at
> > > /var/lib/ceph/osd/ceph-1/keyring, disabling cephx
> > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
> > > 7f513cebedc0 -1 auth: unable to find a keyring on
> > > /var/lib/ceph/osd/ceph-1/keyring: (2) No
> > > such file or directory
> > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
> > > 7f513cebedc0 -1 auth: unable to find a keyring on
> > > /var/lib/ceph/osd/ceph-1/keyring: (2) No
> > > such file or directory
> > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
> > > 7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at
> > > /var/lib/ceph/osd/ceph-1/keyring, disabling cephx
> > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
> > > 7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at
> > > /var/lib/ceph/osd/ceph-1/keyring, disabling cephx
> > > жов 23 16:59:36 p10s ceph-osd[3987]: failed to fetch mon config
> > > (--no-mon-config to skip)
> > > жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Main process
> > > exited, code=exited, status=1/FAILURE
> > > жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Failed with result
> > > 'exit-code'.
> > > 
> > > And so my question is, how to make this OSD known again to Ceph cluster
> > > without recreating it anew with ceph-volume ?
> > > I see that every folder under "/var/lib/ceph/osd/" is a tmpfs mount
> > > point filled with appropriate files and symlinks, except of
> > > "/var/lib/ceph/osd/ceph-1",
> > > which is just an empty folder not mounted anywhere.
> > > I tried to run
> > > 
> > > "ceph-bluestore-tool prime-osd-dir --dev
> > > /dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202
> > > --path
> > > /var/lib/ceph/osd/ceph-1"
> > > 
> > > it created some files under /var/lib/ceph/osd/ceph-1 but without tmpfs
> > > mount, and these files belonged to root. I changed owner of these files
> > > into ceph.ceph ,
> > > I created appropriate symlinks for block and block.db but ceph-osd@1
> > > did not want to start either. Only "unable to find keyring" messages
> > > disappeared.
> > > 
> > > Please give any help on where to move next.
> > > Thanks in advance for your help.
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
David Caro
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux