Re: Brand New Cephadm Deployment, OSDs show either in/down or out/down

Sebastian Wagner <sewagner@xxxxxxxxxx> · Thu, 2 Sep 2021 10:46:07 +0200

Can you verify that the `/usr/lib/sysctl.d/` folder exists on your 
debian machines?

Am 01.09.21 um 15:19 schrieb Alcatraz:
Sebastian,

I appreciate all your help. I actually (out of desperation) spun up 
another cluster, same specs, just using Ubuntu 18.04 rather than 
Debian 10. All the OSDs were recognized, and all went up/in without 
issue.

Thanks

On 9/1/21 06:15, Sebastian Wagner wrote:
Am 30.08.21 um 17:39 schrieb Alcatraz:
Sebastian,

Thanks for responding! And of course.

1. ceph orch ls --service-type osd --format yaml

Output:

service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
  host_pattern: '*'
unmanaged: true
spec:
  data_devices:
    all: true
  filter_logic: AND
  objectstore: bluestore
status:
  created: '2021-08-30T13:57:51.000178Z'
  last_refresh: '2021-08-30T15:24:10.534710Z'
  running: 0
  size: 6
events:
- 2021-08-30T03:48:01.652108Z service:osd.all-available-devices 
[INFO] "service was
  created"
- "2021-08-30T03:49:00.267808Z service:osd.all-available-devices 
[ERROR] \"Failed\
  \ to apply: cephadm exited with an error code: 1, stderr:Non-zero 
exit code 1 from\
  \ /usr/bin/docker container inspect --format {{.State.Status}} 
ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\
  /usr/bin/docker: stdout \n/usr/bin/docker: stderr Error: No such 
container: ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\
  Deploy daemon osd.0 ...\nTraceback (most recent call last):\n File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 8230, in <module>\n    main()\n  File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 8218, in main\n    r = ctx.func(ctx)\n  File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 1759, in _default_image\n    return func(ctx)\n  File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\ 

  , line 4326, in command_deploy\n    ports=daemon_ports)\n File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\ 

  , line 2632, in deploy_daemon\n    c, osd_fsid=osd_fsid, 
ports=ports)\n  File \"\
/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\ 

  , line 2801, in deploy_daemon_units\n    install_sysctl(ctx, fsid, 
daemon_type)\n\
  \  File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 2963, in install_sysctl\n    _write(conf, lines)\n File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\ 

  , line 2948, in _write\n    with open(conf, 'w') as 
f:\nFileNotFoundError: [Errno\
  \ 2] No such file or directory: 
'/usr/lib/sysctl.d/90-ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.conf'\""

https://tracker.ceph.com/issues/52481

- '2021-08-30T03:49:08.356762Z service:osd.all-available-devices 
[ERROR] "Failed to
  apply: auth get failed: failed to find osd.0 in keyring retval: -2"'
- '2021-08-30T03:52:34.100977Z service:osd.all-available-devices 
[ERROR] "Failed to
  apply: auth get failed: failed to find osd.3 in keyring retval: -2"'
- '2021-08-30T03:52:42.260439Z service:osd.all-available-devices 
[ERROR] "Failed to
  apply: auth get failed: failed to find osd.6 in keyring retval: -2"'

Will be fixed by https://github.com/ceph/ceph/pull/42989

2. ceph orch ps --daemon-type osd --format yaml

Output: ...snip...

3. ceph auth add osd.0 osd 'allow *' mon 'allow rwx' -i 
/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring

I verified 
/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring 
file does exist.

Output:

Error EINVAL: caps cannot be specified both in keyring and in command

You only need to create the keyring, you don't need to store the 
keyring anywhere. I'd still suggest to somehow create the keyring, 
but I haven't seen this particular error before.

hth

Sebastian

Thanks

On 8/30/21 10:28, Sebastian Wagner wrote:
Could you run

1. ceph orch ls --service-type osd --format yaml

2. cpeh orch ps --daemon-type osd --format yaml

3. try running the `ceph auth add` call form 
https://docs.ceph.com/en/mimic/rados/operations/add-or-rm-osds/#adding-an-osd-manual 

Am 30.08.21 um 14:49 schrieb Alcatraz:
Hello all,

Running into some issues trying to build a virtual PoC for Ceph. 
Went to my cloud provider of choice and spun up some nodes. I have 
three identical hosts consisting of:

Debian 10
8 cpu cores
16GB RAM
1x315GB Boot Drive
3x400GB Data drives

After deploying Ceph (v 16.2.5) using cephadm, adding hosts, and 
logging into the dashboard, Ceph showed 9 OSDs, 0 up, 9 in. I 
thought perhaps it just needed some time to bring up the OSDs, so 
I left it running overnight.

This morning, I checked, and the Ceph dashboard shows 9 OSDs, 0 
up, 6 in, 3 out. I find this odd, as it hasn't been touched since 
it was deployed. Ceph health shows "HEALTH_OK", `ceph osd tree` 
outputs:

ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
-1              0  root default
 0              0  osd.0           down         0  1.00000
 1              0  osd.1           down         0  1.00000
 2              0  osd.2           down         0  1.00000
 3              0  osd.3           down   1.00000  1.00000
 4              0  osd.4           down   1.00000  1.00000
 5              0  osd.5           down   1.00000  1.00000
 6              0  osd.6           down   1.00000  1.00000
 7              0  osd.7           down   1.00000  1.00000
 8              0  osd.8           down   1.00000  1.00000

and if I run `ls /var/run/ceph` the only thing it outputs is 
"d1405594-0944-11ec-8ebc-f23c92edc936" (sans quotes), which I 
assume is the cluster ID? So of course, if I run `ceph daemon 
osd.8 help` for example, it just returns:

Can't get admin socket path: unable to get conf option 
admin_socket for osd: b"error parsing 'osd': expected string of 
the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, 
client\n"

If I look at the log within the Ceph dashboard, no errors or 
warnings appear. Will Ceph not work on virtual hardware? Is there 
something I need to do to bring up the OSDs?

Just as I was about to send this email I went to check the logs 
and it shows the following (traceback ommited for length):

8/30/21 7:44:15 AM[ERR]Failed to apply osd.all-available-devices 
spec 
DriveGroupSpec(name=all-available-devices->placement=PlacementSpec(host_pattern='*'), 
service_id='all-available-devices', service_type='osd', 
data_devices=DeviceSelection(all=True), osd_id_claims={}, 
unmanaged=False, filter_logic='AND', preview_only=False): auth get 
failed: failed to find osd.6 in keyring retval: -2

8/30/21 7:45:19 AM[ERR]executing create_from_spec_one(([('ceph01', 
<ceph.deployment.drive_selection.selector.DriveSelection object at 
0x7f63a930bf98>), ('ceph02', 
<ceph.deployment.drive_selection.selector.DriveSelection object at 
0x7f63a81ac8d0>), ('ceph03', 
<ceph.deployment.drive_selection.selector.DriveSelection object at 
0x7f63a930b0b8>)],)) failed.

and similar for the other OSDs. I'm not sure why it's complaining 
about auth, because in order to even add the hosts to the cluster 
I had to copy the ceph public key to the hosts to begin with.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx