Re: name alertmanager/node-exporter already in use with v16.2.5

Alexander Sporleder <asporleder@xxxxxxxxxx> · Wed, 04 Aug 2021 15:18:46 +0200

Hello Harry!
Is the work around still working for you? Until now I didn't found an permanent fix. After a few days the "deployment" 
starts again. 

Best,
Alex 

Am Sonntag, dem 11.07.2021 um 19:58 +0000 schrieb Robert W. Eckert:
> I had the same issue for Prometheus and Grafana, the same work around worked for both.
> 
> -----Original Message-----
> From: Harry G. Coin <hgcoin@xxxxxxxxx> 
> Sent: Sunday, July 11, 2021 10:20 AM
> To: ceph-users@xxxxxxx
> Subject:  Re: name alertmanager/node-exporter already in use with v16.2.5
> 
> On 7/8/21 5:06 PM, Bryan Stillwell wrote:
> > I upgraded one of my clusters to v16.2.5 today and now I'm seeing these messages from 'ceph -W cephadm':
> > 
> > 2021-07-08T22:01:55.356953+0000 mgr.excalibur.kuumco [ERR] Failed to 
> > apply alertmanager spec AlertManagerSpec({'placement': PlacementSpec(count=1), 'service_type': 'alertmanager',
> > 'service_id': None, 'unmanaged': False, 'preview_only': False, 'networks': [], 'config': None, 'user_data': {},
> > 'port': None}): name alertmanager.aladdin already in use Traceback (most recent call last):
> >   File "/usr/share/ceph/mgr/cephadm/serve.py", line 582, in _apply_all_services
> >     if self._apply_service(spec):
> >   File "/usr/share/ceph/mgr/cephadm/serve.py", line 743, in _apply_service
> >     rank_generation=slot.rank_generation,
> >   File "/usr/share/ceph/mgr/cephadm/module.py", line 613, in get_unique_name
> >     f'name {daemon_type}.{name} already in use')
> > orchestrator._interface.OrchestratorValidationError: name 
> > alertmanager.aladdin already in use
> > 2021-07-08T22:01:55.372569+0000 mgr.excalibur.kuumco [ERR] Failed to 
> > apply node-exporter spec MonitoringSpec({'placement': PlacementSpec(host_pattern='*'), 'service_type': 'node-
> > exporter', 'service_id': None, 'unmanaged': False, 'preview_only': False, 'networks': [], 'config': None, 'port':
> > None}): name node-exporter.aladdin already in use Traceback (most recent call last):
> >   File "/usr/share/ceph/mgr/cephadm/serve.py", line 582, in _apply_all_services
> >     if self._apply_service(spec):
> >   File "/usr/share/ceph/mgr/cephadm/serve.py", line 743, in _apply_service
> >     rank_generation=slot.rank_generation,
> >   File "/usr/share/ceph/mgr/cephadm/module.py", line 613, in get_unique_name
> >     f'name {daemon_type}.{name} already in use')
> > orchestrator._interface.OrchestratorValidationError: name 
> > node-exporter.aladdin already in use
> > 
> > Also my 'ceph -s' output keeps getting longer and longer (currently 517 lines) with messages like these:
> > 
> >     Updating node-exporter deployment (+6 -6 -> 13) (0s)
> >       [............................]
> >     Updating alertmanager deployment (+1 -1 -> 1) (0s)
> >       [............................]
> > 
> > What's the best way to go about fixing this?  I've tried using 'ceph orch daemon redeploy alertmanager.aladdin' and
> > the same for node-exporter, but it doesn't seem to help.
> 
> 
> Workaround (caution: temporarily disruptive),  Assuming this is the only reported problem remaining after upgrade
> otherwise completes:
> 
> 1.  ceph orch rm node-exporter  
> 
> Wait 30+ seconds.
> 
> 2.  Stop all managers.
> 
> 3.  Start all managers.
> 
> 4.  ceph orch apply node-exporter '*'
> 
> 
> > 
> > Thanks,
> > Bryan
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
> > email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx