Re: OSD down, how to reconstruct it from its main and block.db parts ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Wladim,

If the "unable to find keyring" message disappeared, what was the error after that fix?

If it's still failing to fetch the mon config, check your authentication (you might have to add the osd key to the keyring again), and/or that the mons ips are correct in your osd ceph.conf file.

On 23 October 2020 16:08:02 CEST, Wladimir Mutel <mwg@xxxxxxxxx> wrote:
>Dear all,
>
>after breaking my experimental 1-host Ceph cluster and making one its
>pg 'incomplete' I left it in abandoned state for some time.
>Now I decided to bring it back into life and found that it can not
>start one of its OSDs (osd.1 to name it)
>
>"ceph osd df" shows :
>
>ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META 
> AVAIL    %USE   VAR   PGS  STATUS
>0    hdd        0   1.00000  2.7 TiB  1.6 TiB  1.6 TiB  113 MiB  4.7
>GiB  1.1 TiB  59.77  0.69  102      up
>1    hdd  2.84549         0      0 B      0 B      0 B      0 B      0
>B      0 B      0     0    0    down
>2    hdd  2.84549   1.00000  2.8 TiB  2.6 TiB  2.5 TiB   57 MiB  3.8
>GiB  275 GiB  90.58  1.05  176      up
>3    hdd  2.84549   1.00000  2.8 TiB  2.6 TiB  2.5 TiB   57 MiB  3.9
>GiB  271 GiB  90.69  1.05  185      up
>4    hdd  2.84549   1.00000  2.8 TiB  2.6 TiB  2.5 TiB   63 MiB  4.2
>GiB  263 GiB  90.98  1.05  184      up
>5    hdd  2.84549   1.00000  2.8 TiB  2.6 TiB  2.5 TiB   52 MiB  3.8
>GiB  263 GiB  90.96  1.05  178      up
>6    hdd  2.53400   1.00000  2.5 TiB  2.3 TiB  2.3 TiB  173 MiB  5.2
>GiB  228 GiB  91.21  1.05  178      up
>7    hdd  2.53400   1.00000  2.5 TiB  2.3 TiB  2.3 TiB  147 MiB  5.2
>GiB  230 GiB  91.12  1.05  168      up
>     TOTAL   19 TiB   17 TiB   16 TiB  662 MiB   31 GiB  2.6 TiB  86.48
>MIN/MAX VAR: 0.69/1.05  STDDEV: 10.90
>
>"ceph device ls" shows :
>
>DEVICE                                      HOST:DEV      DAEMONS      
>                 LIFE EXPECTANCY
>GIGABYTE_GP-ASACNE2100TTTDR_SN191108950380  p10s:nvme0n1  osd.1 osd.2
>osd.3 osd.4 osd.5
>WDC_WD30EFRX-68N32N0_WD-WCC7K1JJXVST        p10s:sdd      osd.1
>WDC_WD30EFRX-68N32N0_WD-WCC7K1VUYPRA        p10s:sda      osd.6
>WDC_WD30EFRX-68N32N0_WD-WCC7K2CKX8NT        p10s:sdb      osd.7
>WDC_WD30EFRX-68N32N0_WD-WCC7K2UD8H74        p10s:sde      osd.2
>WDC_WD30EFRX-68N32N0_WD-WCC7K2VFTR1F        p10s:sdh      osd.5
>WDC_WD30EFRX-68N32N0_WD-WCC7K3CYKL87        p10s:sdf      osd.3
>WDC_WD30EFRX-68N32N0_WD-WCC7K6FPZAJP        p10s:sdc      osd.0
>WDC_WD30EFRX-68N32N0_WD-WCC7K7FXSCRN        p10s:sdg      osd.4
>
>In my last migration, I created a bluestore volume with external
>block.db like this :
>
>"ceph-volume lvm prepare --bluestore --data /dev/sdd1 --block.db
>/dev/nvme0n1p4"
>
>And I can see this metadata by
>
>"ceph-bluestore-tool show-label --dev
>/dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202"
>:
>
>{
>"/dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202":
>{
>         "osd_uuid": "8c6324a3-0364-4fad-9dcb-81a1661ee202",
>         "size": 3000588304384,
>         "btime": "2020-07-12T11:34:16.579735+0300",
>         "description": "main",
>         "bfm_blocks": "45785344",
>         "bfm_blocks_per_key": "128",
>         "bfm_bytes_per_block": "65536",
>         "bfm_size": "3000588304384",
>         "bluefs": "1",
>         "ceph_fsid": "49cdfe90-6f6e-4afe-8558-bf14a13aadfa",
>         "kv_backend": "rocksdb",
>         "magic": "ceph osd volume v026",
>         "mkfs_done": "yes",
>         "osd_key": "AQD9ygpf+7+MABAAqtj4y1YYgxwCaAN/jgDSwg==",
>         "ready": "ready",
>         "require_osd_release": "14",
>         "whoami": "1"
>     }
>}
>
>and by
>
>"ceph-bluestore-tool show-label --dev /dev/nvme0n1p4" :
>
>{
>     "/dev/nvme0n1p4": {
>         "osd_uuid": "8c6324a3-0364-4fad-9dcb-81a1661ee202",
>         "size": 128025886720,
>         "btime": "2020-07-12T11:34:16.592054+0300",
>         "description": "bluefs db"
>     }
>}
>
>As you see, their osd_uuid is equal.
>But when I try to start it by hand : "systemctl restart ceph-osd@1" ,
>I get this in the logs : ("journalctl -b -u ceph-osd@1")
>
>-- Logs begin at Tue 2020-10-13 19:09:49 EEST, end at Fri 2020-10-23
>16:59:38 EEST. --
>жов 23 16:59:36 p10s systemd[1]: Starting Ceph object storage daemon
>osd.1...
>жов 23 16:59:36 p10s systemd[1]: Started Ceph object storage daemon
>osd.1.
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
>7f513cebedc0 -1 auth: unable to find a keyring on
>/var/lib/ceph/osd/ceph-1/keyring: (2) No 
>such file or directory
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
>7f513cebedc0 -1 auth: unable to find a keyring on
>/var/lib/ceph/osd/ceph-1/keyring: (2) No 
>such file or directory
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
>7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at 
>/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
>7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at 
>/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
>7f513cebedc0 -1 auth: unable to find a keyring on
>/var/lib/ceph/osd/ceph-1/keyring: (2) No 
>such file or directory
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
>7f513cebedc0 -1 auth: unable to find a keyring on
>/var/lib/ceph/osd/ceph-1/keyring: (2) No 
>such file or directory
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
>7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at 
>/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
>жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
>7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at 
>/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
>жов 23 16:59:36 p10s ceph-osd[3987]: failed to fetch mon config
>(--no-mon-config to skip)
>жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Main process
>exited, code=exited, status=1/FAILURE
>жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Failed with result
>'exit-code'.
>
>And so my question is, how to make this OSD known again to Ceph cluster
>without recreating it anew with ceph-volume ?
>I see that every folder under "/var/lib/ceph/osd/" is a tmpfs mount
>point filled with appropriate files and symlinks, except of
>"/var/lib/ceph/osd/ceph-1", 
>which is just an empty folder not mounted anywhere.
>I tried to run
>
>"ceph-bluestore-tool prime-osd-dir --dev
>/dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202
>--path 
>/var/lib/ceph/osd/ceph-1"
>
>it created some files under /var/lib/ceph/osd/ceph-1 but without tmpfs
>mount, and these files belonged to root. I changed owner of these files
>into ceph.ceph , 
>I created appropriate symlinks for block and block.db but ceph-osd@1
>did not want to start either. Only "unable to find keyring" messages
>disappeared.
>
>Please give any help on where to move next.
>Thanks in advance for your help.
>_______________________________________________
>ceph-users mailing list -- ceph-users@xxxxxxx
>To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
David Caro
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux