Re: Converting to cephadm from ceph-deploy

Mazzystr <mazzystr@xxxxxxxxx> · Tue, 28 Dec 2021 14:10:02 -0800

/var/lib/ceph/osd/ceph-X/block is a soft link.  Track down the soft link
chain to the devmapper device.  Make sure ceph:ceph owns it

Example:
blah:/var/lib/ceph/osd/ceph-0 # ls -la block*
total 44
lrwxrwxrwx 1 ceph ceph 23 Apr 11  2019 block -> /dev/mapper/ceph-0block
lrwxrwxrwx 1 ceph ceph 20 Apr 11  2019 block.db -> /dev/mapper/ceph-0db
lrwxrwxrwx 1 ceph ceph 21 Apr 11  2019 block.wal -> /dev/mapper/ceph-0wal

blah:/var/lib/ceph/osd/ceph-0 # ls -la /dev/mapper/ceph-0block
lrwxrwxrwx 1 root root 8 Dec 28 12:41 /dev/mapper/ceph-0block -> ../dm-30

blah:/var/lib/ceph/osd/ceph-0 # ls -la /dev/dm-30
brw-rw---- 1 ceph ceph 254, 30 Dec 28 14:05 /dev/dm-30

I land a udev rule to the host to help me correct the ownership problem

cat > /etc/udev/rules.d/99-ceph-osd-${OSD_ID}.rules <<
EOFENV{DM_NAME}=="ceph-${OSD_ID}" OWNER="ceph" GROUP="ceph"
MODE="0660"ENV{DM_NAME}=="ceph-${OSD_ID}wal" OWNER="ceph" GROUP="ceph"
MODE="0660"ENV{DM_NAME}=="ceph-${OSD_ID}db" OWNER="ceph" GROUP="ceph"
MODE="0660"ENV{DM_NAME}=="ceph-${OSD_ID}block" OWNER="ceph"
GROUP="ceph" MODE="0660"EOF

On Tue, Dec 28, 2021 at 12:42 PM Andre Goree <agoree@xxxxxxxxxxxxxxxxxx>
wrote:

> First off, I made a similar post on 12/11/21 but had not explicitly signed
> up for the new mailing list (this email is a remnant from when the list was
> run with mailman) and I didn't get a reply here and couldn't reply, so I
> have to make this again, I apologize of the noise).
>
>
> Hello all.  I'm  upgrading a cluster from (Ubuntu 16.04) Luminous to
> Pacific, within
> which I've upgraded to (18.04) Nautilus, then to (20.04) Octopus.  The
> cluster ran
> flawlessly througout that upgrade process which I'm very happy about.
>
> I'm now at the point of converting the cluster to cephadm (it was built
> with
> ceph-deploy), but I'm running into trouble.  I've followed this doc:
> https://docs.ceph.com/en/latest/cephadm/adoption/
>
> 3 MON nodes
> 4 OSD nodes
>
> The trouble is two-fold:  (1) it seems to be that once I've adopted the
> MON & MGR
> daemons, I can't seem to get the localhost MON to list with "ceph orch ps"
> only the two other MON nodes:
>
> #### On MON node ####
> root@cephmon01test:~# ceph orch ps
> NAME               HOST           PORTS  STATUS         REFRESHED  AGE
> MEM USE  MEM LIM
> VERSION  IMAGE ID      CONTAINER ID
> mgr.cephmon02test  cephmon02test         running (21h)     8m ago  21h
>  365M        -
> 16.2.5   6933c2a0b7dd  e08de388b92e
> mgr.cephmon03test  cephmon03test         running (21h)     6m ago  21h
>  411M        -
> 16.2.5   6933c2a0b7dd  d358b697e49b
> mon.cephmon02test  cephmon02test         running (21h)     8m ago    -
>  934M    2048M
> 16.2.5   6933c2a0b7dd  f349d7cc6816
> mon.cephmon03test  cephmon03test         running (21h)     6m ago    -
>  923M    2048M
> 16.2.5   6933c2a0b7dd  64880b0659cc
>
> root@cephmon01test:~# ceph orch ls
> NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
> mgr              2/0  8m ago     -    <unmanaged>
> mon              2/0  8m ago     -    <unmanaged>
>
>
> All of the 'cephadm adopt' commands for the MONs and MGRs were run from
> the above
> node.
>
> My second issue is that when I proceed to adopt the OSDs (again, following
> https://docs.ceph.com/en/latest/cephadm/adoption/), they seem to drop out
> of the cluster:
>
> ### on OSD node ###
> root@cephosd01test:~# cephadm ls
> [
>     {
>         "style": "cephadm:v1",
>         "name": "osd.0",
>         "fsid": "4cfa6467-6647-41e9-8184-1cacc408265c",
>         "systemd_unit":
> &quot;ceph-4cfa6467-6647-41e9-8184-1cacc408265c(a)osd.0&quot;sd.0",
>         "enabled": true,
>         "state": "error",
>         "container_id": null,
>         "container_image_name": "ceph/ceph:v16",
>         "container_image_id": null,
>         "version": null,
>         "started": null,
>         "created": null,
>         "deployed": "2021-12-11T00:19:24.799615Z",
>         "configured": null
>     },
>     {
>         "style": "cephadm:v1",
>         "name": "osd.1",
>         "fsid": "4cfa6467-6647-41e9-8184-1cacc408265c",
>         "systemd_unit":
> &quot;ceph-4cfa6467-6647-41e9-8184-1cacc408265c(a)osd.1&quot;sd.1",
>         "enabled": true,
>         "state": "error",
>         "container_id": null,
>         "container_image_name": "ceph/ceph:v16",
>         "container_image_id": null,
>         "version": null,
>         "started": null,
>         "created": null,
>         "deployed": "2021-12-11T21:20:02.170515Z",
>         "configured": null
>     }
> ]
>
> Ceph health snippet:
>   services:
>     mon: 3 daemons, quorum cephmon02test,cephmon03test,cephmon01test (age
> 21h)
>     mgr: cephmon03test(active, since 21h), standbys: cephmon02test
>     osd: 8 osds: 6 up (since 39m), 8 in
>          flags noout
>
> Is there a specific way to get those OSDs adopted by cephadm to be shown
> properly in the
> cluster and ceph orchestrator?
>
> I asked the same question elsewhere and was asked if I could see my
> containers running, I have a reply for that:
>
> Further background info, this cluster was build with 'ceph-deploy' on
> 12.2.4, I'm not sure if that's an issue _specifically_ for the conversion
> to cephadm, but I've been able to upgrade from Ubuntu Xenial & Luminous to
> Ubuntu Focal & Pacific -- it's just this conversion to cephadm that I'm
> having the issue with. This cluster is _only_ used for RBD devices (via
> Libvirt).
>
> When I run "bash -x /var/lib/ceph/$FSID/osd.0/unit.run" I find that it's
> failing after looking for a block device that doesn't exist -- namely
> /var/lib/ceph/osd/ceph-0. This device was accurate for the
> ceph-deploy-built OSDs, but after 'cephadm adopt' has been run, the correct
> block device is '/dev/dm-1' if I'm not mistaken.
>
> Looking at the cephadm logs, it appears this was by design as far as
> cephadm is concerned, however this is clearly the wrong device and so the
> containers fail to start.
>
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1
> bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open
> /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080  1
> bluestore(/var/lib/ceph/osd/ceph-0) _mount path /var/lib/ceph/osd/ceph-0
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080  0
> bluestore(/var/lib/ceph/osd/ceph-0) _open_db_and_around read-only:0 repair:0
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1
> bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open
> /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080  1 bdev(0x5642f6a9a400
> /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1 bdev(0x5642f6a9a400
> /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1 osd.0 0 OSD:init:
> unable to mount object store
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1  ** ERROR: osd init
> failed: (13) Permission denied
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx