Re: Brand New Cephadm Deployment, OSDs show either in/down or out/down

Daniel Tönnißen <dt@xxxxxxx> · Wed, 1 Sep 2021 15:40:12 +0200

Debian 10 is not on the recommended platform list for Ceph.
Maybe a problem due to the change from sysvinit to systemd?
-- 
Mit freundlichen Grüßen aus Oberhausen
Daniel Tönnißen
(Systemadministrator)
 <https://www.kamp.de/> <https://www.kamp.de/unternehmen/iso-27001.html>	
KAMP Netzwerkdienste GmbH
Vestische Str. 89−91 | 46117 Oberhausen 
Fon:+49 (0) 208.89 402-50 <tel:+492088940250>
Fax: +49 (0) 208.89 402-40E-Mail:dt@xxxxxxx <mailto:dt@xxxxxxx>
WWW: https://www.kamp.de
Geschäftsführer: Heiner Lante | Michael Lante | Amtsgericht Duisburg | HRB Nr. 12154 | USt-IdNr.: DE120607556
HINWEIS: UNSERE HINWEISE ZUM UMGANG MIT PERSONENBEZOGENEN DATEN FINDEN SIE IN UNSERER DATENSCHUTZERKLÄRUNG UNTER HTTPS://WWW.KAMP.DE/DATENSCHUTZ.HTML <https://www.kamp.de/datenschutz.html>

DIESE NACHRICHT IST NUR FÜR DEN ADRESSATEN BESTIMMT. ES IST NICHT ERLAUBT, DIESE NACHRICHT ZU KOPIEREN ODER DRITTEN ZUGÄNGLICH ZU MACHEN. SOLLTEN SIE IRRTÜMLICH DIESE NACHRICHT ERHALTEN HABEN, BITTE ICH UM IHRE MITTEILUNG PER E-MAIL ODER UNTER DER OBEN ANGEGEBENEN TELEFONNUMMER.

> Am 01.09.2021 um 15:19 schrieb Alcatraz <admin@alcatraz.network>:
> 
> Sebastian,
> 
> 
> I appreciate all your help. I actually (out of desperation) spun up another cluster, same specs, just using Ubuntu 18.04 rather than Debian 10. All the OSDs were recognized, and all went up/in without issue.
> 
> 
> Thanks
> 
> On 9/1/21 06:15, Sebastian Wagner wrote:
>> Am 30.08.21 um 17:39 schrieb Alcatraz:
>>> Sebastian,
>>> 
>>> 
>>> Thanks for responding! And of course.
>>> 
>>> 
>>> 1. ceph orch ls --service-type osd --format yaml
>>> 
>>> Output:
>>> 
>>> service_type: osd
>>> service_id: all-available-devices
>>> service_name: osd.all-available-devices
>>> placement:
>>>   host_pattern: '*'
>>> unmanaged: true
>>> spec:
>>>   data_devices:
>>>     all: true
>>>   filter_logic: AND
>>>   objectstore: bluestore
>>> status:
>>>   created: '2021-08-30T13:57:51.000178Z'
>>>   last_refresh: '2021-08-30T15:24:10.534710Z'
>>>   running: 0
>>>   size: 6
>>> events:
>>> - 2021-08-30T03:48:01.652108Z service:osd.all-available-devices [INFO] "service was
>>>   created"
>>> - "2021-08-30T03:49:00.267808Z service:osd.all-available-devices [ERROR] \"Failed\
>>>   \ to apply: cephadm exited with an error code: 1, stderr:Non-zero exit code 1 from\
>>>   \ /usr/bin/docker container inspect --format {{.State.Status}} ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\
>>>   /usr/bin/docker: stdout \n/usr/bin/docker: stderr Error: No such container: ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.0\n\
>>>   Deploy daemon osd.0 ...\nTraceback (most recent call last):\n File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
>>>   , line 8230, in <module>\n    main()\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
>>>   , line 8218, in main\n    r = ctx.func(ctx)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
>>>   , line 1759, in _default_image\n    return func(ctx)\n  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
>>>   , line 4326, in command_deploy\n    ports=daemon_ports)\n File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\ 
>>>   , line 2632, in deploy_daemon\n    c, osd_fsid=osd_fsid, ports=ports)\n  File \"\
>>> /var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\ 
>>>   , line 2801, in deploy_daemon_units\n    install_sysctl(ctx, fsid, daemon_type)\n\
>>>   \  File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\
>>>   , line 2963, in install_sysctl\n    _write(conf, lines)\n File \"/var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931\"\ 
>>>   , line 2948, in _write\n    with open(conf, 'w') as f:\nFileNotFoundError: [Errno\
>>>   \ 2] No such file or directory: '/usr/lib/sysctl.d/90-ceph-d1405594-0944-11ec-8ebc-f23c92edc936-osd.conf'\""
>> 
>> https://tracker.ceph.com/issues/52481
>> 
>>> - '2021-08-30T03:49:08.356762Z service:osd.all-available-devices [ERROR] "Failed to
>>>   apply: auth get failed: failed to find osd.0 in keyring retval: -2"'
>>> - '2021-08-30T03:52:34.100977Z service:osd.all-available-devices [ERROR] "Failed to
>>>   apply: auth get failed: failed to find osd.3 in keyring retval: -2"'
>>> - '2021-08-30T03:52:42.260439Z service:osd.all-available-devices [ERROR] "Failed to
>>>   apply: auth get failed: failed to find osd.6 in keyring retval: -2"'
>> 
>> Will be fixed by https://github.com/ceph/ceph/pull/42989
>> 
>> 
>>> 
>>> 
>>> 2. ceph orch ps --daemon-type osd --format yaml
>>> 
>>> Output: ...snip...
>>> 
>>> 3. ceph auth add osd.0 osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring
>>> 
>>> I verified /var/lib/ceph/d1405594-0944-11ec-8ebc-f23c92edc936/osd.0/keyring file does exist.
>>> 
>>> Output:
>>> 
>>> Error EINVAL: caps cannot be specified both in keyring and in command
>> 
>> 
>> You only need to create the keyring, you don't need to store the keyring anywhere. I'd still suggest to somehow create the keyring, but I haven't seen this particular error before.
>> 
>> 
>> hth
>> 
>> Sebastian
>> 
>> 
>>> 
>>> Thanks
>>> 
>>> On 8/30/21 10:28, Sebastian Wagner wrote:
>>>> Could you run
>>>> 
>>>> 1. ceph orch ls --service-type osd --format yaml
>>>> 
>>>> 2. cpeh orch ps --daemon-type osd --format yaml
>>>> 
>>>> 3. try running the `ceph auth add` call form https://docs.ceph.com/en/mimic/rados/operations/add-or-rm-osds/#adding-an-osd-manual 
>>>> 
>>>> 
>>>> Am 30.08.21 um 14:49 schrieb Alcatraz:
>>>>> Hello all,
>>>>> 
>>>>> Running into some issues trying to build a virtual PoC for Ceph. Went to my cloud provider of choice and spun up some nodes. I have three identical hosts consisting of:
>>>>> 
>>>>> Debian 10
>>>>> 8 cpu cores
>>>>> 16GB RAM
>>>>> 1x315GB Boot Drive
>>>>> 3x400GB Data drives
>>>>> 
>>>>> After deploying Ceph (v 16.2.5) using cephadm, adding hosts, and logging into the dashboard, Ceph showed 9 OSDs, 0 up, 9 in. I thought perhaps it just needed some time to bring up the OSDs, so I left it running overnight.
>>>>> 
>>>>> This morning, I checked, and the Ceph dashboard shows 9 OSDs, 0 up, 6 in, 3 out. I find this odd, as it hasn't been touched since it was deployed. Ceph health shows "HEALTH_OK", `ceph osd tree` outputs:
>>>>> 
>>>>> ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
>>>>> -1              0  root default
>>>>>  0              0  osd.0           down         0  1.00000
>>>>>  1              0  osd.1           down         0  1.00000
>>>>>  2              0  osd.2           down         0  1.00000
>>>>>  3              0  osd.3           down   1.00000  1.00000
>>>>>  4              0  osd.4           down   1.00000  1.00000
>>>>>  5              0  osd.5           down   1.00000  1.00000
>>>>>  6              0  osd.6           down   1.00000  1.00000
>>>>>  7              0  osd.7           down   1.00000  1.00000
>>>>>  8              0  osd.8           down   1.00000  1.00000
>>>>> 
>>>>> and if I run `ls /var/run/ceph` the only thing it outputs is "d1405594-0944-11ec-8ebc-f23c92edc936" (sans quotes), which I assume is the cluster ID? So of course, if I run `ceph daemon osd.8 help` for example, it just returns:
>>>>> 
>>>>> Can't get admin socket path: unable to get conf option admin_socket for osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, client\n"
>>>>> 
>>>>> If I look at the log within the Ceph dashboard, no errors or warnings appear. Will Ceph not work on virtual hardware? Is there something I need to do to bring up the OSDs?
>>>>> 
>>>>> Just as I was about to send this email I went to check the logs and it shows the following (traceback ommited for length):
>>>>> 
>>>>> 8/30/21 7:44:15 AM[ERR]Failed to apply osd.all-available-devices spec DriveGroupSpec(name=all-available-devices->placement=PlacementSpec(host_pattern='*'), service_id='all-available-devices', service_type='osd', data_devices=DeviceSelection(all=True), osd_id_claims={}, unmanaged=False, filter_logic='AND', preview_only=False): auth get failed: failed to find osd.6 in keyring retval: -2
>>>>> 
>>>>> 8/30/21 7:45:19 AM[ERR]executing create_from_spec_one(([('ceph01', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a930bf98>), ('ceph02', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a81ac8d0>), ('ceph03', <ceph.deployment.drive_selection.selector.DriveSelection object at 0x7f63a930b0b8>)],)) failed.
>>>>> 
>>>>> and similar for the other OSDs. I'm not sure why it's complaining about auth, because in order to even add the hosts to the cluster I had to copy the ceph public key to the hosts to begin with.
>>>>> 
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> 
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx