Re: Getting cephadm "stderr:Inferring config" every minute in log - for a monitor that doesn't exist and shouldn't exist

"Robert W. Eckert" <rob@xxxxxxxxxxxxxxx> · Mon, 20 Sep 2021 18:23:01 +0000

That may be pointing in the right direction - I see

   {
       "style": "legacy",
       "name": "mon.rhel1.robeckert.us",
       "fsid": "fe3a7cb0-69ca-11eb-8d45-c86000d08867",
       "systemd_unit": "ceph-mon@xxxxxxxxxxxxxxxxxx",
       "enabled": false,
       "state": "stopped",
       "host_version": "16.2.5"
   },

And
    {
        "style": "cephadm:v1",
        "name": "mon.rhel1",
        "fsid": "fe3a7cb0-69ca-11eb-8d45-c86000d08867",
        "systemd_unit": "ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867@mon.rhel1",
        "enabled": true,
        "state": "running",
        "service_name": "mon",
        "ports": [],
        "ip": null,
        "deployed_by": [
            "quay.io/ceph/ceph@sha256:5d042251e1faa1408663508099cf97b256364300365d403ca5563a518060abac",
            "quay.io/ceph/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37d7a9b37db1e0ff6691aae6466530"
        ],
        "rank": null,
        "rank_generation": null,
        "memory_request": null,
        "memory_limit": null,
        "container_id": null,
        "container_image_name": "quay.io/ceph/ceph@sha256:5d042251e1faa1408663508099cf97b256364300365d403ca5563a518060abac",
        "container_image_id": null,
        "container_image_digests": null,
        "version": null,
        "started": null,
        "created": "2021-09-20T15:46:42.166486Z",
        "deployed": "2021-09-20T15:46:41.136498Z",
        "configured": "2021-09-20T15:47:23.002007Z"
    }

As the output.

In /var/lib/ceph/mon (not /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon), there is a link:
ceph-rhel1.robeckert.us -> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/

I removed the link and the error did clear up.  (hopefully it will stay gone :-))

Thanks,

Rob

-----Original Message-----
From: Fyodor Ustinov <ufm@xxxxxx> 
Sent: Monday, September 20, 2021 2:01 PM
To: Robert W. Eckert <rob@xxxxxxxxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxx>
Subject: Re:  Getting cephadm "stderr:Inferring config" every minute in log - for a monitor that doesn't exist and shouldn't exist

Hi!

It looks exactly the same as the problem I had. 

Try the `cephadm ls` command on the `rhel1.robeckert.us` node. 

----- Original Message -----
> From: "Robert W. Eckert" <rob@xxxxxxxxxxxxxxx>
> To: "ceph-users" <ceph-users@xxxxxxx>
> Sent: Monday, 20 September, 2021 18:28:08
> Subject:  Getting cephadm "stderr:Inferring config" every 
> minute in log - for a monitor that doesn't exist and shouldn't exist

> Hi- after the upgrade to 16.2.6, I am now seeing this error:
> 
> 9/20/21 10:45:00 AM[ERR]cephadm exited with an error code: 1, 
> stderr:Inferring config 
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert
> .us/config
> ERROR: [Errno 2] No such file or directory:
> '/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
> Traceback (most recent call last): File 
> "/usr/share/ceph/mgr/cephadm/serve.py",
> line 1366, in _remote_connection yield (conn, connr) File 
> "/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in _run_cephadm 
> code,
> '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm 
> exited with an error code: 1, stderr:Inferring config 
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert
> .us/config
> ERROR: [Errno 2] No such file or directory:
> '/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config'
> 
> The rhel1 server has a monitor under
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1 , and it 
> is up and active.  If I copy the
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1 to 
> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert
> .us the error clears, then cephadm removes the folder with the domain 
> name, and the error starts showing up in the log again.
> 
> After a few minutes, I get the all clear:
> 
> 9/20/21 11:00:00 AM[INF]overall HEALTH_OK
> 
> 9/20/21 10:58:38 AM[INF]Removing key for mon.
> 
> 9/20/21 10:58:37 AM[INF]Removing daemon mon.rhel1.robeckert.us from 
> rhel1.robeckert.us
> 
> 9/20/21 10:58:37 AM[INF]Removing monitor rhel1.robeckert.us from monmap...
> 
> 9/20/21 10:58:37 AM[INF]Safe to remove mon.rhel1.robeckert.us: not in 
> monmap (['rhel1', 'story', 'cube'])
> 
> 9/20/21 10:52:21 AM[INF]Cluster is now healthy
> 
> 9/20/21 10:52:21 AM[INF]Health check cleared: CEPHADM_REFRESH_FAILED (was:
> failed to probe daemons or devices)
> 
> 9/20/21 10:51:15 AM
> 
> 
> I checked all of the configurations and can't find any reason it wants 
> the monitor with the domain.
> 
> But then the errors start up again - I haven't found any messages 
> before they start up, I am going to monitor more closely.
> This doesn't seem to affect any functionality, just lots of messages in the log.
> 
> Thanks,
> Rob
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
> email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx