Re: Brand New Cephadm Deployment, OSDs show either in/down or out/down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sebastian,


Thanks for responding! And of course.


1. ceph orch ls --service-type osd --format yaml

Output:

service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
  host_pattern: '*'
unmanaged: true
spec:
  data_devices:
    all: true
  filter_logic: AND
  objectstore: bluestore
status:
  created: '2021-08-30T13:57:51.000178Z'
  last_refresh: '2021-08-30T15:24:10.534710Z'
  running: 0
  size: 6
events:
- 2021-08-30T03:48:01.652108Z service:osd.all-available-devices [INFO] "service was
  created"
- "2021-08-30T03:49:00.267808Z service:osd.all-available-devices [ERROR] \"Failed\   \ to apply: cephadm exited with an error code: 1, stderr:Non-zero exit code 1 from\   \ /usr/bin/docker container inspect --format {{.State.Status}} ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\   /usr/bin/docker: stdout \n/usr/bin/docker: stderr Error: No such container: ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\   Deploy daemon osd.0 ...\nTraceback (most recent call last):\n File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 8230, in <module>\n    main()\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 8218, in main\n    r = ctx.func(ctx)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 1759, in _default_image\n    return func(ctx)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 4326, in command_deploy\n    ports=daemon_ports)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 2632, in deploy_daemon\n    c, osd_fsid=osd_fsid, ports=ports)\n  File \"\
/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
  , line 2801, in deploy_daemon_units\n    install_sysctl(ctx, fsid, daemon_type)\n\   \  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 2963, in install_sysctl\n    _write(conf, lines)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\   , line 2948, in _write\n    with open(conf, 'w') as f:\nFileNotFoundError: [Errno\   \ 2] No such file or directory: '/usr/lib/sysctl.d/90-ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.conf'\"" - '2021-08-30T03:49:08.356762Z service:osd.all-available-devices [ERROR] "Failed to
  apply: auth get failed: failed to find osd.0 in keyring retval: -2"'
- '2021-08-30T03:52:34.100977Z service:osd.all-available-devices [ERROR] "Failed to
  apply: auth get failed: failed to find osd.3 in keyring retval: -2"'
- '2021-08-30T03:52:42.260439Z service:osd.all-available-devices [ERROR] "Failed to
  apply: auth get failed: failed to find osd.6 in keyring retval: -2"'


2. ceph orch ps --daemon-type osd --format yaml

Output:

daemon_type: osd
daemon_id: '0'
service_name: osd.all-available-devices
hostname: ceph01
container_image_name: docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
memory_request: 4294967296
status: 0
status_desc: stopped
osdspec_affinity: ''
is_active: false
ports: []
last_refresh: '2021-08-30T15:25:15.214407Z'
created: '2021-08-30T03:49:11.403333Z'
events:
- 2021-08-30T03:49:11.439359Z daemon:osd.0 [INFO] "Reconfigured osd.0 on host 'ceph01'"
---
daemon_type: osd
daemon_id: '3'
service_name: osd.all-available-devices
hostname: ceph02
container_image_name: docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
memory_request: 4294967296
status: 0
status_desc: stopped
osdspec_affinity: ''
is_active: false
ports: []
last_refresh: '2021-08-30T15:24:10.692430Z'
created: '2021-08-30T03:52:48.919854Z'
events:
- 2021-08-30T03:52:48.966369Z daemon:osd.3 [INFO] "Reconfigured osd.3 on host 'ceph02'"
---
daemon_type: osd
daemon_id: '6'
service_name: osd.all-available-devices
hostname: ceph03
container_image_name: docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
memory_request: 4294967296
status: 0
status_desc: stopped
osdspec_affinity: ''
is_active: false
ports: []
last_refresh: '2021-08-30T15:24:10.534710Z'
created: '2021-08-30T03:52:50.818025Z'
events:
- 2021-08-30T03:52:50.862937Z daemon:osd.6 [INFO] "Reconfigured osd.6 on host 'ceph03'"

3. ceph auth add osd.0 osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring

I verified /var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring file does exist.

Output:

Error EINVAL: caps cannot be specified both in keyring and in command

Thanks

On 8/30/21 10:28, Sebastian Wagner wrote:
Could you run

1. ceph orch ls --service-type osd --format yaml

2. cpeh orch ps --daemon-type osd --format yaml

3. try running the `ceph auth add` call form https://docs.ceph.com/en/mimic/rados/operations/add-or-rm-osds/#adding-an-osd-manual


Am 30.08.21 um 14:49 schrieb Alcatraz:
Hello all,

Running into some issues trying to build a virtual PoC for Ceph. Went to my cloud provider of choice and spun up some nodes. I have three identical hosts consisting of:

Debian 10
8 cpu cores
16GB RAM
1x315GB Boot Drive
3x400GB Data drives

After deploying Ceph (v 16.2.5) using cephadm, adding hosts, and logging into the dashboard, Ceph showed 9 OSDs, 0 up, 9 in. I thought perhaps it just needed some time to bring up the OSDs, so I left it running overnight.

This morning, I checked, and the Ceph dashboard shows 9 OSDs, 0 up, 6 in, 3 out. I find this odd, as it hasn't been touched since it was deployed. Ceph health shows "HEALTH_OK", `ceph osd tree` outputs:

ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
-1              0  root default
 0              0  osd.0           down         0  1.00000
 1              0  osd.1           down         0  1.00000
 2              0  osd.2           down         0  1.00000
 3              0  osd.3           down   1.00000  1.00000
 4              0  osd.4           down   1.00000  1.00000
 5              0  osd.5           down   1.00000  1.00000
 6              0  osd.6           down   1.00000  1.00000
 7              0  osd.7           down   1.00000  1.00000
 8              0  osd.8           down   1.00000  1.00000

and if I run `ls /var/run/ceph` the only thing it outputs is "d1405594-0944-11ec-8ebc-f23c92edc936" (sans quotes), which I assume is the cluster ID? So of course, if I run `ceph daemon osd.8 help` for example, it just returns:

Can't get admin socket path: unable to get conf option admin_socket for osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, client\n"

If I look at the log within the Ceph dashboard, no errors or warnings appear. Will Ceph not work on virtual hardware? Is there something I need to do to bring up the OSDs?

Just as I was about to send this email I went to check the logs and it shows the following (traceback ommited for length):

8/30/21 7:44:15 AM[ERR]Failed to apply osd.all-available-devices spec DriveGroupSpec(name=all-available-devices->placement=PlacementSpec(host_pattern='*'), service_id='all-available-devices', service_type='osd', data_devices=DeviceSelection(all=True), osd_id_claims={}, unmanaged=False, filter_logic='AND', preview_only=False): auth get failed: failed to find osd.6 in keyring retval: -2

8/30/21 7:45:19 AM[ERR]executing create_from_spec_one(([('ceph01', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a930bf98>), ('ceph02', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a81ac8d0>), ('ceph03', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a930b0b8>)],)) failed.

and similar for the other OSDs. I'm not sure why it's complaining about auth, because in order to even add the hosts to the cluster I had to copy the ceph public key to the hosts to begin with.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux