Re: Ceph standby-replay metadata server: MDS internal heartbeat is not healthy

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Wed, 19 Feb 2020 20:15:59 -0800

Hi Martin,

On Thu, Feb 13, 2020 at 4:10 AM Martin Palma <martin@xxxxxxxx> wrote:
>
> Hi all,
>
> today we observe that out of the sudden our standby-replay metadata
> server continuously writes the following logs:
>
> 2020-02-13 11:56:50.216102 7fd2ad229700  1 heartbeat_map is_healthy
> 'MDSRank' had timed out after 15
> 2020-02-13 11:56:50.287699 7fd2ad229700  0 mds.beacon.dcucmds401
> Skipping beacon heartbeat to monitors (last acked 100.836s ago); MDS
> internal heartbeat is not healthy!
>
> and it's memory is growing until no memory is available any more and
> the service gets restarted and then stops. The funny thing is that on
> the active MDS we are not seeing these log messages and any increase
> of memory.
>
> We are running ceph version 12.2.10 on all nodes of our Ceph cluster.
> Any suggestions?

Please increase debugging on the standby-replay daemon and share the logs.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx