That may be pointing in the right direction - I see { "style": "legacy", "name": "mon.rhel1.robeckert.us", "fsid": "fe3a7cb0-69ca-11eb-8d45-c86000d08867", "systemd_unit": "ceph-mon@xxxxxxxxxxxxxxxxxx", "enabled": false, "state": "stopped", "host_version": "16.2.5" }, And { "style": "cephadm:v1", "name": "mon.rhel1", "fsid": "fe3a7cb0-69ca-11eb-8d45-c86000d08867", "systemd_unit": "ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867@mon.rhel1", "enabled": true, "state": "running", "service_name": "mon", "ports": [], "ip": null, "deployed_by": [ "quay.io/ceph/ceph@sha256:5d042251e1faa1408663508099cf97b256364300365d403ca5563a518060abac", "quay.io/ceph/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37d7a9b37db1e0ff6691aae6466530" ], "rank": null, "rank_generation": null, "memory_request": null, "memory_limit": null, "container_id": null, "container_image_name": "quay.io/ceph/ceph@sha256:5d042251e1faa1408663508099cf97b256364300365d403ca5563a518060abac", "container_image_id": null, "container_image_digests": null, "version": null, "started": null, "created": "2021-09-20T15:46:42.166486Z", "deployed": "2021-09-20T15:46:41.136498Z", "configured": "2021-09-20T15:47:23.002007Z" } As the output. In /var/lib/ceph/mon (not /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon), there is a link: ceph-rhel1.robeckert.us -> /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/ I removed the link and the error did clear up. (hopefully it will stay gone :-)) Thanks, Rob -----Original Message----- From: Fyodor Ustinov <ufm@xxxxxx> Sent: Monday, September 20, 2021 2:01 PM To: Robert W. Eckert <rob@xxxxxxxxxxxxxxx> Cc: ceph-users <ceph-users@xxxxxxx> Subject: Re: Getting cephadm "stderr:Inferring config" every minute in log - for a monitor that doesn't exist and shouldn't exist Hi! It looks exactly the same as the problem I had. Try the `cephadm ls` command on the `rhel1.robeckert.us` node. ----- Original Message ----- > From: "Robert W. Eckert" <rob@xxxxxxxxxxxxxxx> > To: "ceph-users" <ceph-users@xxxxxxx> > Sent: Monday, 20 September, 2021 18:28:08 > Subject: Getting cephadm "stderr:Inferring config" every > minute in log - for a monitor that doesn't exist and shouldn't exist > Hi- after the upgrade to 16.2.6, I am now seeing this error: > > 9/20/21 10:45:00 AM[ERR]cephadm exited with an error code: 1, > stderr:Inferring config > /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert > .us/config > ERROR: [Errno 2] No such file or directory: > '/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config' > Traceback (most recent call last): File > "/usr/share/ceph/mgr/cephadm/serve.py", > line 1366, in _remote_connection yield (conn, connr) File > "/usr/share/ceph/mgr/cephadm/serve.py", line 1263, in _run_cephadm > code, > '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm > exited with an error code: 1, stderr:Inferring config > /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert > .us/config > ERROR: [Errno 2] No such file or directory: > '/var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert.us/config' > > The rhel1 server has a monitor under > /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1 , and it > is up and active. If I copy the > /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1 to > /var/lib/ceph/fe3a7cb0-69ca-11eb-8d45-c86000d08867/mon.rhel1.robeckert > .us the error clears, then cephadm removes the folder with the domain > name, and the error starts showing up in the log again. > > After a few minutes, I get the all clear: > > 9/20/21 11:00:00 AM[INF]overall HEALTH_OK > > 9/20/21 10:58:38 AM[INF]Removing key for mon. > > 9/20/21 10:58:37 AM[INF]Removing daemon mon.rhel1.robeckert.us from > rhel1.robeckert.us > > 9/20/21 10:58:37 AM[INF]Removing monitor rhel1.robeckert.us from monmap... > > 9/20/21 10:58:37 AM[INF]Safe to remove mon.rhel1.robeckert.us: not in > monmap (['rhel1', 'story', 'cube']) > > 9/20/21 10:52:21 AM[INF]Cluster is now healthy > > 9/20/21 10:52:21 AM[INF]Health check cleared: CEPHADM_REFRESH_FAILED (was: > failed to probe daemons or devices) > > 9/20/21 10:51:15 AM > > > I checked all of the configurations and can't find any reason it wants > the monitor with the domain. > > But then the errors start up again - I haven't found any messages > before they start up, I am going to monitor more closely. > This doesn't seem to affect any functionality, just lots of messages in the log. > > Thanks, > Rob > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx