Re: Converting to cephadm from ceph-deploy

Mazzystr <mazzystr@xxxxxxxxx> · Tue, 28 Dec 2021 17:38:10 -0800

Glad to help  :)

re ceph user .... Oh yea, that is an artifact left over from when I was
running on bare metal.  I need to change the rule to use uid/gid 167 as
well.

On Tue, Dec 28, 2021 at 4:28 PM Andre Goree <agoree@xxxxxxxxxxxxxxxxxx>
wrote:

> The one issue I'm seeing and probably the root of my problem is that
> cephadm set the user 'ceph' uid to 167...it's something else entirely on my
> system (perhaps from the fact that it's an older Luminous cluster built
> with ceph-deploy).
>
> However, even when changing the ceph uid to what cephadm/docker is looking
> for (167), something is changing the perms on /dev/dm-1.
>
> Annnnnd I got it working using the udev rules you provided!  So, I think
> for my whole issue, I'll need to make sure the uid & gid for the ceph user
> is set to 167 (not sure why that was set but the fix is easy enough) and
> have udev rules avail to properly set the perms on /dev/dm-X as such.
>
> Thanks!
>
> ________________________________________
> From: Andre Goree <agoree@xxxxxxxxxxxxxxxxxx>
> Sent: Tuesday, December 28, 2021 6:40 PM
> To: Mazzystr
> Cc: ceph-users@xxxxxxx
> Subject: Re:  Converting to cephadm from ceph-deploy
>
> Thank you!  I did figure that it maybe should be a soft link, and in fact
> I tried to fix it by linking everything properly, but as you've shown with
> your 'ls' example of that directory, I certainly missed a few things.  This
> helps immensely.
>
> Oddly enough, however, even the dir '/var/lib/ceph/osd/ceph-X' itself does
> not exist, and if I'm not mistaken, is copied to
> '/var/lib/ceph/$FSID/osd-X'.  Easy enough to determine how that needs to be
> symlinked, and inside 'osd-X' I see the relevant 'block' link so it does
> appear that everything's there.  The perms are another aspect I hadn't
> considered.  I'm going to try to work this out and report back, thanks!
>
> ________________________________________
> From: Mazzystr <mazzystr@xxxxxxxxx>
> Sent: Tuesday, December 28, 2021 5:10 PM
> To: Andre Goree
> Cc: ceph-users@xxxxxxx
> Subject: Re:  Converting to cephadm from ceph-deploy
>
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
> /var/lib/ceph/osd/ceph-X/block is a soft link.  Track down the soft link
> chain to the devmapper device.  Make sure ceph:ceph owns it
>
> Example:
> blah:/var/lib/ceph/osd/ceph-0 # ls -la block*
> total 44
> lrwxrwxrwx 1 ceph ceph 23 Apr 11  2019 block -> /dev/mapper/ceph-0block
> lrwxrwxrwx 1 ceph ceph 20 Apr 11  2019 block.db -> /dev/mapper/ceph-0db
> lrwxrwxrwx 1 ceph ceph 21 Apr 11  2019 block.wal -> /dev/mapper/ceph-0wal
>
> blah:/var/lib/ceph/osd/ceph-0 # ls -la /dev/mapper/ceph-0block
> lrwxrwxrwx 1 root root 8 Dec 28 12:41 /dev/mapper/ceph-0block -> ../dm-30
>
> blah:/var/lib/ceph/osd/ceph-0 # ls -la /dev/dm-30
> brw-rw---- 1 ceph ceph 254, 30 Dec 28 14:05 /dev/dm-30
>
>
> I land a udev rule to the host to help me correct the ownership problem
>
> cat > /etc/udev/rules.d/99-ceph-osd-${OSD_ID}.rules << EOF
> ENV{DM_NAME}=="ceph-${OSD_ID}" OWNER="ceph" GROUP="ceph" MODE="0660"
> ENV{DM_NAME}=="ceph-${OSD_ID}wal" OWNER="ceph" GROUP="ceph" MODE="0660"
> ENV{DM_NAME}=="ceph-${OSD_ID}db" OWNER="ceph" GROUP="ceph" MODE="0660"
> ENV{DM_NAME}=="ceph-${OSD_ID}block" OWNER="ceph" GROUP="ceph" MODE="0660"
> EOF
>
>
>
> On Tue, Dec 28, 2021 at 12:42 PM Andre Goree <agoree@xxxxxxxxxxxxxxxxxx
> <mailto:agoree@xxxxxxxxxxxxxxxxxx>> wrote:
> First off, I made a similar post on 12/11/21 but had not explicitly signed
> up for the new mailing list (this email is a remnant from when the list was
> run with mailman) and I didn't get a reply here and couldn't reply, so I
> have to make this again, I apologize of the noise).
>
>
> Hello all.  I'm  upgrading a cluster from (Ubuntu 16.04) Luminous to
> Pacific, within
> which I've upgraded to (18.04) Nautilus, then to (20.04) Octopus.  The
> cluster ran
> flawlessly througout that upgrade process which I'm very happy about.
>
> I'm now at the point of converting the cluster to cephadm (it was built
> with
> ceph-deploy), but I'm running into trouble.  I've followed this doc:
> https://docs.ceph.com/en/latest/cephadm/adoption/
>
> 3 MON nodes
> 4 OSD nodes
>
> The trouble is two-fold:  (1) it seems to be that once I've adopted the
> MON & MGR
> daemons, I can't seem to get the localhost MON to list with "ceph orch ps"
> only the two other MON nodes:
>
> #### On MON node ####
> root@cephmon01test:~# ceph orch ps
> NAME               HOST           PORTS  STATUS         REFRESHED  AGE
> MEM USE  MEM LIM
> VERSION  IMAGE ID      CONTAINER ID
> mgr.cephmon02test  cephmon02test         running (21h)     8m ago  21h
>  365M        -
> 16.2.5   6933c2a0b7dd  e08de388b92e
> mgr.cephmon03test  cephmon03test         running (21h)     6m ago  21h
>  411M        -
> 16.2.5   6933c2a0b7dd  d358b697e49b
> mon.cephmon02test  cephmon02test         running (21h)     8m ago    -
>  934M    2048M
> 16.2.5   6933c2a0b7dd  f349d7cc6816
> mon.cephmon03test  cephmon03test         running (21h)     6m ago    -
>  923M    2048M
> 16.2.5   6933c2a0b7dd  64880b0659cc
>
> root@cephmon01test:~# ceph orch ls
> NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
> mgr              2/0  8m ago     -    <unmanaged>
> mon              2/0  8m ago     -    <unmanaged>
>
>
> All of the 'cephadm adopt' commands for the MONs and MGRs were run from
> the above
> node.
>
> My second issue is that when I proceed to adopt the OSDs (again, following
> https://docs.ceph.com/en/latest/cephadm/adoption/), they seem to drop out
> of the cluster:
>
> ### on OSD node ###
> root@cephosd01test:~# cephadm ls
> [
>     {
>         "style": "cephadm:v1",
>         "name": "osd.0",
>         "fsid": "4cfa6467-6647-41e9-8184-1cacc408265c",
>         "systemd_unit":
> &quot;ceph-4cfa6467-6647-41e9-8184-1cacc408265c(a)osd.0&quot;sd.0",
>         "enabled": true,
>         "state": "error",
>         "container_id": null,
>         "container_image_name": "ceph/ceph:v16",
>         "container_image_id": null,
>         "version": null,
>         "started": null,
>         "created": null,
>         "deployed": "2021-12-11T00:19:24.799615Z",
>         "configured": null
>     },
>     {
>         "style": "cephadm:v1",
>         "name": "osd.1",
>         "fsid": "4cfa6467-6647-41e9-8184-1cacc408265c",
>         "systemd_unit":
> &quot;ceph-4cfa6467-6647-41e9-8184-1cacc408265c(a)osd.1&quot;sd.1",
>         "enabled": true,
>         "state": "error",
>         "container_id": null,
>         "container_image_name": "ceph/ceph:v16",
>         "container_image_id": null,
>         "version": null,
>         "started": null,
>         "created": null,
>         "deployed": "2021-12-11T21:20:02.170515Z",
>         "configured": null
>     }
> ]
>
> Ceph health snippet:
>   services:
>     mon: 3 daemons, quorum cephmon02test,cephmon03test,cephmon01test (age
> 21h)
>     mgr: cephmon03test(active, since 21h), standbys: cephmon02test
>     osd: 8 osds: 6 up (since 39m), 8 in
>          flags noout
>
> Is there a specific way to get those OSDs adopted by cephadm to be shown
> properly in the
> cluster and ceph orchestrator?
>
> I asked the same question elsewhere and was asked if I could see my
> containers running, I have a reply for that:
>
> Further background info, this cluster was build with 'ceph-deploy' on
> 12.2.4, I'm not sure if that's an issue _specifically_ for the conversion
> to cephadm, but I've been able to upgrade from Ubuntu Xenial & Luminous to
> Ubuntu Focal & Pacific -- it's just this conversion to cephadm that I'm
> having the issue with. This cluster is _only_ used for RBD devices (via
> Libvirt).
>
> When I run "bash -x /var/lib/ceph/$FSID/osd.0/unit.run" I find that it's
> failing after looking for a block device that doesn't exist -- namely
> /var/lib/ceph/osd/ceph-0. This device was accurate for the
> ceph-deploy-built OSDs, but after 'cephadm adopt' has been run, the correct
> block device is '/dev/dm-1' if I'm not mistaken.
>
> Looking at the cephadm logs, it appears this was by design as far as
> cephadm is concerned, however this is clearly the wrong device and so the
> containers fail to start.
>
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1
> bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open
> /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080  1
> bluestore(/var/lib/ceph/osd/ceph-0) _mount path /var/lib/ceph/osd/ceph-0
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080  0
> bluestore(/var/lib/ceph/osd/ceph-0) _open_db_and_around read-only:0 repair:0
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1
> bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open
> /var/lib/ceph/osd/ceph-0/block: (13) Permission denied
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080  1 bdev(0x5642f6a9a400
> /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1 bdev(0x5642f6a9a400
> /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1 osd.0 0 OSD:init:
> unable to mount object store
> debug 2021-12-28T03:33:58.368+0000 7f4b3207c080 -1  ** ERROR: osd init
> failed: (13) Permission denied
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:
> ceph-users-leave@xxxxxxx>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx