Re: Brand New Cephadm Deployment, OSDs show either in/down or out/down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 30.08.21 um 17:39 schrieb Alcatraz:
Sebastian,


Thanks for responding! And of course.


1. ceph orch ls --service-type osd --format yaml

Output:

service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
  host_pattern: '*'
unmanaged: true
spec:
  data_devices:
    all: true
  filter_logic: AND
  objectstore: bluestore
status:
  created: '2021-08-30T13:57:51.000178Z'
  last_refresh: '2021-08-30T15:24:10.534710Z'
  running: 0
  size: 6
events:
- 2021-08-30T03:48:01.652108Z service:osd.all-available-devices [INFO] "service was
  created"
- "2021-08-30T03:49:00.267808Z service:osd.all-available-devices [ERROR] \"Failed\   \ to apply: cephadm exited with an error code: 1, stderr:Non-zero exit code 1 from\   \ /usr/bin/docker container inspect --format {{.State.Status}} ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\   /usr/bin/docker: stdout \n/usr/bin/docker: stderr Error: No such container: ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\   Deploy daemon osd.0 ...\nTraceback (most recent call last):\n File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 8230, in <module>\n    main()\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 8218, in main\n    r = ctx.func(ctx)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 1759, in _default_image\n    return func(ctx)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 4326, in command_deploy\n    ports=daemon_ports)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 2632, in deploy_daemon\n    c, osd_fsid=osd_fsid, ports=ports)\n  File \"\ /var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 2801, in deploy_daemon_units\n    install_sysctl(ctx, fsid, daemon_type)\n\   \  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 2963, in install_sysctl\n    _write(conf, lines)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 2948, in _write\n    with open(conf, 'w') as f:\nFileNotFoundError: [Errno\   \ 2] No such file or directory: '/usr/lib/sysctl.d/90-ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.conf'\""

https://tracker.ceph.com/issues/52481

- '2021-08-30T03:49:08.356762Z service:osd.all-available-devices [ERROR] "Failed to
  apply: auth get failed: failed to find osd.0 in keyring retval: -2"'
- '2021-08-30T03:52:34.100977Z service:osd.all-available-devices [ERROR] "Failed to
  apply: auth get failed: failed to find osd.3 in keyring retval: -2"'
- '2021-08-30T03:52:42.260439Z service:osd.all-available-devices [ERROR] "Failed to
  apply: auth get failed: failed to find osd.6 in keyring retval: -2"'

Will be fixed by https://github.com/ceph/ceph/pull/42989




2. ceph orch ps --daemon-type osd --format yaml

Output: ...snip...

3. ceph auth add osd.0 osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring

I verified /var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring file does exist.

Output:

Error EINVAL: caps cannot be specified both in keyring and in command


You only need to create the keyring, you don't need to store the keyring anywhere. I'd still suggest to somehow create the keyring, but I haven't seen this particular error before.


hth

Sebastian



Thanks

On 8/30/21 10:28, Sebastian Wagner wrote:
Could you run

1. ceph orch ls --service-type osd --format yaml

2. cpeh orch ps --daemon-type osd --format yaml

3. try running the `ceph auth add` call form https://docs.ceph.com/en/mimic/rados/operations/add-or-rm-osds/#adding-an-osd-manual


Am 30.08.21 um 14:49 schrieb Alcatraz:
Hello all,

Running into some issues trying to build a virtual PoC for Ceph. Went to my cloud provider of choice and spun up some nodes. I have three identical hosts consisting of:

Debian 10
8 cpu cores
16GB RAM
1x315GB Boot Drive
3x400GB Data drives

After deploying Ceph (v 16.2.5) using cephadm, adding hosts, and logging into the dashboard, Ceph showed 9 OSDs, 0 up, 9 in. I thought perhaps it just needed some time to bring up the OSDs, so I left it running overnight.

This morning, I checked, and the Ceph dashboard shows 9 OSDs, 0 up, 6 in, 3 out. I find this odd, as it hasn't been touched since it was deployed. Ceph health shows "HEALTH_OK", `ceph osd tree` outputs:

ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
-1              0  root default
 0              0  osd.0           down         0  1.00000
 1              0  osd.1           down         0  1.00000
 2              0  osd.2           down         0  1.00000
 3              0  osd.3           down   1.00000  1.00000
 4              0  osd.4           down   1.00000  1.00000
 5              0  osd.5           down   1.00000  1.00000
 6              0  osd.6           down   1.00000  1.00000
 7              0  osd.7           down   1.00000  1.00000
 8              0  osd.8           down   1.00000  1.00000

and if I run `ls /var/run/ceph` the only thing it outputs is "d1405594-0944-11ec-8ebc-f23c92edc936" (sans quotes), which I assume is the cluster ID? So of course, if I run `ceph daemon osd.8 help` for example, it just returns:

Can't get admin socket path: unable to get conf option admin_socket for osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, client\n"

If I look at the log within the Ceph dashboard, no errors or warnings appear. Will Ceph not work on virtual hardware? Is there something I need to do to bring up the OSDs?

Just as I was about to send this email I went to check the logs and it shows the following (traceback ommited for length):

8/30/21 7:44:15 AM[ERR]Failed to apply osd.all-available-devices spec DriveGroupSpec(name=all-available-devices->placement=PlacementSpec(host_pattern='*'), service_id='all-available-devices', service_type='osd', data_devices=DeviceSelection(all=True), osd_id_claims={}, unmanaged=False, filter_logic='AND', preview_only=False): auth get failed: failed to find osd.6 in keyring retval: -2

8/30/21 7:45:19 AM[ERR]executing create_from_spec_one(([('ceph01', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a930bf98>), ('ceph02', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a81ac8d0>), ('ceph03', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a930b0b8>)],)) failed.

and similar for the other OSDs. I'm not sure why it's complaining about auth, because in order to even add the hosts to the cluster I had to copy the ceph public key to the hosts to begin with.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux