Brand New Cephadm Deployment, OSDs show either in/down or out/down

Alcatraz <admin@alcatraz.network> · Mon, 30 Aug 2021 07:49:21 -0500

Hello all,

Running into some issues trying to build a virtual PoC for Ceph. Went to 
my cloud provider of choice and spun up some nodes. I have three 
identical hosts consisting of:

Debian 10
8 cpu cores
16GB RAM
1x315GB Boot Drive
3x400GB Data drives

After deploying Ceph (v 16.2.5) using cephadm, adding hosts, and logging 
into the dashboard, Ceph showed 9 OSDs, 0 up, 9 in. I thought perhaps it 
just needed some time to bring up the OSDs, so I left it running overnight.

This morning, I checked, and the Ceph dashboard shows 9 OSDs, 0 up, 6 
in, 3 out. I find this odd, as it hasn't been touched since it was 
deployed. Ceph health shows "HEALTH_OK", `ceph osd tree` outputs:

ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
-1              0  root default
 0              0  osd.0           down         0  1.00000
 1              0  osd.1           down         0  1.00000
 2              0  osd.2           down         0  1.00000
 3              0  osd.3           down   1.00000  1.00000
 4              0  osd.4           down   1.00000  1.00000
 5              0  osd.5           down   1.00000  1.00000
 6              0  osd.6           down   1.00000  1.00000
 7              0  osd.7           down   1.00000  1.00000
 8              0  osd.8           down   1.00000  1.00000

and if I run `ls /var/run/ceph` the only thing it outputs is 
"d1405594-0944-11ec-8ebc-f23c92edc936" (sans quotes), which I assume is 
the cluster ID? So of course, if I run `ceph daemon osd.8 help` for 
example, it just returns:

Can't get admin socket path: unable to get conf option admin_socket for 
osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid 
types are: auth, mon, osd, mds, mgr, client\n"

If I look at the log within the Ceph dashboard, no errors or warnings 
appear. Will Ceph not work on virtual hardware? Is there something I 
need to do to bring up the OSDs?

Just as I was about to send this email I went to check the logs and it 
shows the following (traceback ommited for length):

8/30/21 7:44:15 AM[ERR]Failed to apply osd.all-available-devices spec 
DriveGroupSpec(name=all-available-devices->placement=PlacementSpec(host_pattern='*'), 
service_id='all-available-devices', service_type='osd', 
data_devices=DeviceSelection(all=True), osd_id_claims={}, 
unmanaged=False, filter_logic='AND', preview_only=False): auth get 
failed: failed to find osd.6 in keyring retval: -2

8/30/21 7:45:19 AM[ERR]executing create_from_spec_one(([('ceph01', 
<ceph.deployment.drive_selection.selector.DriveSelection object at 
0x7f63a930bf98>), ('ceph02', 
<ceph.deployment.drive_selection.selector.DriveSelection object at 
0x7f63a81ac8d0>), ('ceph03', 
<ceph.deployment.drive_selection.selector.DriveSelection object at 
0x7f63a930b0b8>)],)) failed.

and similar for the other OSDs. I'm not sure why it's complaining about 
auth, because in order to even add the hosts to the cluster I had to 
copy the ceph public key to the hosts to begin with.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx