Re: Converting to cephadm from ceph-deploy

Andre Goree <agoree@xxxxxxxxxxxxxxxxxx> · Wed, 29 Dec 2021 00:28:35 +0000

The one issue I'm seeing and probably the root of my problem is that cephadm set the user 'ceph' uid to 167...it's something else entirely on my system (perhaps from the fact that it's an older Luminous cluster built with ceph-deploy).

However, even when changing the ceph uid to what cephadm/docker is looking for (167), something is changing the perms on /dev/dm-1.

Annnnnd I got it working using the udev rules you provided!  So, I think for my whole issue, I'll need to make sure the uid & gid for the ceph user is set to 167 (not sure why that was set but the fix is easy enough) and have udev rules avail to properly set the perms on /dev/dm-X as such.

Thanks!

________________________________________
From: Andre Goree <agoree@xxxxxxxxxxxxxxxxxx>
Sent: Tuesday, December 28, 2021 6:40 PM
To: Mazzystr
Cc: ceph-users@xxxxxxx
Subject: Re:  Converting to cephadm from ceph-deploy

Thank you!  I did figure that it maybe should be a soft link, and in fact I tried to fix it by linking everything properly, but as you've shown with your 'ls' example of that directory, I certainly missed a few things.  This helps immensely.

Oddly enough, however, even the dir '/var/lib/ceph/osd/ceph-X' itself does not exist, and if I'm not mistaken, is copied to '/var/lib/ceph/$FSID/osd-X'.  Easy enough to determine how that needs to be symlinked, and inside 'osd-X' I see the relevant 'block' link so it does appear that everything's there.  The perms are another aspect I hadn't considered.  I'm going to try to work this out and report back, thanks!

________________________________________
From: Mazzystr <mazzystr@xxxxxxxxx>
Sent: Tuesday, December 28, 2021 5:10 PM
To: Andre Goree
Cc: ceph-users@xxxxxxx
Subject: Re:  Converting to cephadm from ceph-deploy

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

/var/lib/ceph/osd/ceph-X/block is a soft link.  Track down the soft link chain to the devmapper device.  Make sure ceph:ceph owns it

Example:
blah:/var/lib/ceph/osd/ceph-0 # ls -la block*
total 44
lrwxrwxrwx 1 ceph ceph 23 Apr 11  2019 block -> /dev/mapper/ceph-0block
lrwxrwxrwx 1 ceph ceph 20 Apr 11  2019 block.db -> /dev/mapper/ceph-0db
lrwxrwxrwx 1 ceph ceph 21 Apr 11  2019 block.wal -> /dev/mapper/ceph-0wal

blah:/var/lib/ceph/osd/ceph-0 # ls -la /dev/mapper/ceph-0block
lrwxrwxrwx 1 root root 8 Dec 28 12:41 /dev/mapper/ceph-0block -> ../dm-30

blah:/var/lib/ceph/osd/ceph-0 # ls -la /dev/dm-30
brw-rw---- 1 ceph ceph 254, 30 Dec 28 14:05 /dev/dm-30

I land a udev rule to the host to help me correct the ownership problem

cat > /etc/udev/rules.d/99-ceph-osd-${OSD_ID}.rules << EOF
ENV{DM_NAME}=="ceph-${OSD_ID}" OWNER="ceph" GROUP="ceph" MODE="0660"
ENV{DM_NAME}=="ceph-${OSD_ID}wal" OWNER="ceph" GROUP="ceph" MODE="0660"
ENV{DM_NAME}=="ceph-${OSD_ID}db" OWNER="ceph" GROUP="ceph" MODE="0660"
ENV{DM_NAME}=="ceph-${OSD_ID}block" OWNER="ceph" GROUP="ceph" MODE="0660"
EOF

On Tue, Dec 28, 2021 at 12:42 PM Andre Goree <agoree@xxxxxxxxxxxxxxxxxx<mailto:agoree@xxxxxxxxxxxxxxxxxx>> wrote:
First off, I made a similar post on 12/11/21 but had not explicitly signed up for the new mailing list (this email is a remnant from when the list was run with mailman) and I didn't get a reply here and couldn't reply, so I have to make this again, I apologize of the noise).

Hello all.  I'm  upgrading a cluster from (Ubuntu 16.04) Luminous to Pacific, within
which I've upgraded to (18.04) Nautilus, then to (20.04) Octopus.  The cluster ran
flawlessly througout that upgrade process which I'm very happy about.

I'm now at the point of converting the cluster to cephadm (it was built with
ceph-deploy), but I'm running into trouble.  I've followed this doc:
https://docs.ceph.com/en/latest/cephadm/adoption/

3 MON nodes
4 OSD nodes

The trouble is two-fold:  (1) it seems to be that once I've adopted the MON & MGR
daemons, I can't seem to get the localhost MON to list with "ceph orch ps"
only the two other MON nodes:

#### On MON node ####
root@cephmon01test:~# ceph orch ps
NAME               HOST           PORTS  STATUS         REFRESHED  AGE  MEM USE  MEM LIM
VERSION  IMAGE ID      CONTAINER ID
mgr.cephmon02test  cephmon02test         running (21h)     8m ago  21h     365M        -
16.2.5   6933c2a0b7dd  e08de388b92e
mgr.cephmon03test  cephmon03test         running (21h)     6m ago  21h     411M        -
16.2.5   6933c2a0b7dd  d358b697e49b
mon.cephmon02test  cephmon02test         running (21h)     8m ago    -     934M    2048M
16.2.5   6933c2a0b7dd  f349d7cc6816
mon.cephmon03test  cephmon03test         running (21h)     6m ago    -     923M    2048M
16.2.5   6933c2a0b7dd  64880b0659cc

root@cephmon01test:~# ceph orch ls
NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
mgr              2/0  8m ago     -    <unmanaged>
mon              2/0  8m ago     -    <unmanaged>

All of the 'cephadm adopt' commands for the MONs and MGRs were run from the above
node.

My second issue is that when I proceed to adopt the OSDs (again, following
https://docs.ceph.com/en/latest/cephadm/adoption/), they seem to drop out of the cluster:

### on OSD node ###
root@cephosd01test:~# cephadm ls
[
    {
        "style": "cephadm:v1",
        "name": "osd.0",
        "fsid": "4cfa6467-6647-41e9-8184-1cacc408265c",
        "systemd_unit":
&quot;ceph-4cfa6467-6647-41e9-8184-1cacc408265c(a)osd.0&quot;sd.0",
        "enabled": true,
        "state": "error",
        "container_id": null,
        "container_image_name": "ceph/ceph:v16",
        "container_image_id": null,
        "version": null,
        "started": null,
        "created": null,
        "deployed": "2021-12-11T00:19:24.799615Z",
        "configured": null
    },
    {
        "style": "cephadm:v1",
        "name": "osd.1",
        "fsid": "4cfa6467-6647-41e9-8184-1cacc408265c",
        "systemd_unit":
&quot;ceph-4cfa6467-6647-41e9-8184-1cacc408265c(a)osd.1&quot;sd.1",
        "enabled": true,
        "state": "error",
        "container_id": null,
        "container_image_name": "ceph/ceph:v16",
        "container_image_id": null,
        "version": null,
        "started": null,
        "created": null,
        "deployed": "2021-12-11T21:20:02.170515Z",
        "configured": null
    }
]

Ceph health snippet:
  services:
    mon: 3 daemons, quorum cephmon02test,cephmon03test,cephmon01test (age 21h)
    mgr: cephmon03test(active, since 21h), standbys: cephmon02test
    osd: 8 osds: 6 up (since 39m), 8 in
         flags noout

Is there a specific way to get those OSDs adopted by cephadm to be shown properly in the
cluster and ceph orchestrator?

I asked the same question elsewhere and was asked if I could see my containers running, I have a reply for that:

Further background info, this cluster was build with 'ceph-deploy' on 12.2.4, I'm not sure if that's an issue _specifically_ for the conversion to cephadm, but I've been able to upgrade from Ubuntu Xenial & Luminous to Ubuntu Focal & Pacific -- it's just this conversion to cephadm that I'm having the issue with. This cluster is _only_ used for RBD devices (via Libvirt).

When I run "bash -x /var/lib/ceph/$FSID/osd.0/unit.run" I find that it's failing after looking for a block device that doesn't exist -- namely /var/lib/ceph/osd/ceph-0. This device was accurate for the ceph-deploy-built OSDs, but after 'cephadm adopt' has been run, the correct block device is '/dev/dm-1' if I'm not mistaken.

Looking at the cephadm logs, it appears this was by design as far as cephadm is concerned, however this is clearly the wrong device and so the containers fail to start.

debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
debug 2021-12-28T03:33:58.368+0000 7f4b3207c080  1 bluestore(/var/lib/ceph/osd/ceph-0) _mount path /var/lib/ceph/osd/ceph-0
debug 2021-12-28T03:33:58.368+0000 7f4b3207c080  0 bluestore(/var/lib/ceph/osd/ceph-0) _open_db_and_around read-only:0 repair:0
debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
debug 2021-12-28T03:33:58.368+0000 7f4b3207c080  1 bdev(0x5642f6a9a400 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1 bdev(0x5642f6a9a400 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied
debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1 osd.0 0 OSD:init: unable to mount object store
debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1  ** ERROR: osd init failed: (13) Permission denied
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx