Re: Brand New Cephadm Deployment, OSDs show either in/down or out/down

Alcatraz <admin@alcatraz.network> · Mon, 30 Aug 2021 10:39:35 -0500

Sebastian,

Thanks for responding! And of course.

1. ceph orch ls --service-type osd --format yaml

Output:

service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
  host_pattern: '*'
unmanaged: true
spec:
  data_devices:
    all: true
  filter_logic: AND
  objectstore: bluestore
status:
  created: '2021-08-30T13:57:51.000178Z'
  last_refresh: '2021-08-30T15:24:10.534710Z'
  running: 0
  size: 6
events:
- 2021-08-30T03:48:01.652108Z service:osd.all-available-devices [INFO] 
"service was
  created"
- "2021-08-30T03:49:00.267808Z service:osd.all-available-devices [ERROR] 
\"Failed\
  \ to apply: cephadm exited with an error code: 1, stderr:Non-zero 
exit code 1 from\
  \ /usr/bin/docker container inspect --format {{.State.Status}} 
ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\
  /usr/bin/docker: stdout \n/usr/bin/docker: stderr Error: No such 
container: ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\
  Deploy daemon osd.0 ...\nTraceback (most recent call last):\n File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 8230, in <module>\n    main()\n  File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 8218, in main\n    r = ctx.func(ctx)\n  File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 1759, in _default_image\n    return func(ctx)\n  File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 4326, in command_deploy\n    ports=daemon_ports)\n  File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 2632, in deploy_daemon\n    c, osd_fsid=osd_fsid, 
ports=ports)\n  File \"\
/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 2801, in deploy_daemon_units\n    install_sysctl(ctx, fsid, 
daemon_type)\n\
  \  File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 2963, in install_sysctl\n    _write(conf, lines)\n  File 
\"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 2948, in _write\n    with open(conf, 'w') as 
f:\nFileNotFoundError: [Errno\
  \ 2] No such file or directory: 
'/usr/lib/sysctl.d/90-ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.conf'\""
- '2021-08-30T03:49:08.356762Z service:osd.all-available-devices [ERROR] 
"Failed to
  apply: auth get failed: failed to find osd.0 in keyring retval: -2"'
- '2021-08-30T03:52:34.100977Z service:osd.all-available-devices [ERROR] 
"Failed to
  apply: auth get failed: failed to find osd.3 in keyring retval: -2"'
- '2021-08-30T03:52:42.260439Z service:osd.all-available-devices [ERROR] 
"Failed to
  apply: auth get failed: failed to find osd.6 in keyring retval: -2"'

2. ceph orch ps --daemon-type osd --format yaml

Output:

daemon_type: osd
daemon_id: '0'
service_name: osd.all-available-devices
hostname: ceph01
container_image_name: 
docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
memory_request: 4294967296
status: 0
status_desc: stopped
osdspec_affinity: ''
is_active: false
ports: []
last_refresh: '2021-08-30T15:25:15.214407Z'
created: '2021-08-30T03:49:11.403333Z'
events:
- 2021-08-30T03:49:11.439359Z daemon:osd.0 [INFO] "Reconfigured osd.0 on 
host 'ceph01'"
---
daemon_type: osd
daemon_id: '3'
service_name: osd.all-available-devices
hostname: ceph02
container_image_name: 
docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
memory_request: 4294967296
status: 0
status_desc: stopped
osdspec_affinity: ''
is_active: false
ports: []
last_refresh: '2021-08-30T15:24:10.692430Z'
created: '2021-08-30T03:52:48.919854Z'
events:
- 2021-08-30T03:52:48.966369Z daemon:osd.3 [INFO] "Reconfigured osd.3 on 
host 'ceph02'"
---
daemon_type: osd
daemon_id: '6'
service_name: osd.all-available-devices
hostname: ceph03
container_image_name: 
docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
memory_request: 4294967296
status: 0
status_desc: stopped
osdspec_affinity: ''
is_active: false
ports: []
last_refresh: '2021-08-30T15:24:10.534710Z'
created: '2021-08-30T03:52:50.818025Z'
events:
- 2021-08-30T03:52:50.862937Z daemon:osd.6 [INFO] "Reconfigured osd.6 on 
host 'ceph03'"

3. ceph auth add osd.0 osd 'allow *' mon 'allow rwx' -i 
/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring

I verified 
/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring file 
does exist.

Output:

Error EINVAL: caps cannot be specified both in keyring and in command

Thanks

On 8/30/21 10:28, Sebastian Wagner wrote:
Could you run

1. ceph orch ls --service-type osd --format yaml

2. cpeh orch ps --daemon-type osd --format yaml

3. try running the `ceph auth add` call form 
https://docs.ceph.com/en/mimic/rados/operations/add-or-rm-osds/#adding-an-osd-manual 

Am 30.08.21 um 14:49 schrieb Alcatraz:
Hello all,

Running into some issues trying to build a virtual PoC for Ceph. Went 
to my cloud provider of choice and spun up some nodes. I have three 
identical hosts consisting of:

Debian 10
8 cpu cores
16GB RAM
1x315GB Boot Drive
3x400GB Data drives

After deploying Ceph (v 16.2.5) using cephadm, adding hosts, and 
logging into the dashboard, Ceph showed 9 OSDs, 0 up, 9 in. I thought 
perhaps it just needed some time to bring up the OSDs, so I left it 
running overnight.

This morning, I checked, and the Ceph dashboard shows 9 OSDs, 0 up, 6 
in, 3 out. I find this odd, as it hasn't been touched since it was 
deployed. Ceph health shows "HEALTH_OK", `ceph osd tree` outputs:

ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
-1              0  root default
 0              0  osd.0           down         0  1.00000
 1              0  osd.1           down         0  1.00000
 2              0  osd.2           down         0  1.00000
 3              0  osd.3           down   1.00000  1.00000
 4              0  osd.4           down   1.00000  1.00000
 5              0  osd.5           down   1.00000  1.00000
 6              0  osd.6           down   1.00000  1.00000
 7              0  osd.7           down   1.00000  1.00000
 8              0  osd.8           down   1.00000  1.00000

and if I run `ls /var/run/ceph` the only thing it outputs is 
"d1405594-0944-11ec-8ebc-f23c92edc936" (sans quotes), which I assume 
is the cluster ID? So of course, if I run `ceph daemon osd.8 help` 
for example, it just returns:

Can't get admin socket path: unable to get conf option admin_socket 
for osd: b"error parsing 'osd': expected string of the form TYPE.ID, 
valid types are: auth, mon, osd, mds, mgr, client\n"

If I look at the log within the Ceph dashboard, no errors or warnings 
appear. Will Ceph not work on virtual hardware? Is there something I 
need to do to bring up the OSDs?

Just as I was about to send this email I went to check the logs and 
it shows the following (traceback ommited for length):

8/30/21 7:44:15 AM[ERR]Failed to apply osd.all-available-devices spec 
DriveGroupSpec(name=all-available-devices->placement=PlacementSpec(host_pattern='*'), 
service_id='all-available-devices', service_type='osd', 
data_devices=DeviceSelection(all=True), osd_id_claims={}, 
unmanaged=False, filter_logic='AND', preview_only=False): auth get 
failed: failed to find osd.6 in keyring retval: -2

8/30/21 7:45:19 AM[ERR]executing create_from_spec_one(([('ceph01', 
<ceph.deployment.drive_selection.selector.DriveSelection object at 
0x7f63a930bf98>), ('ceph02', 
<ceph.deployment.drive_selection.selector.DriveSelection object at 
0x7f63a81ac8d0>), ('ceph03', 
<ceph.deployment.drive_selection.selector.DriveSelection object at 
0x7f63a930b0b8>)],)) failed.

and similar for the other OSDs. I'm not sure why it's complaining 
about auth, because in order to even add the hosts to the cluster I 
had to copy the ceph public key to the hosts to begin with.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx