Looks like the immediate danger has passed by: [root@gnosis ~]# ceph status cluster: id: e4ece518-f2cb-4708-b00f-b6bf511e91d9 health: HEALTH_WARN nodown,noout flag(s) set 735 slow ops, oldest one blocked for 3573 sec, daemons [mon.ceph-02,mon.ceph-03] have slow ops. services: mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 mgr: ceph-01(active), standbys: ceph-03, ceph-02 mds: con-fs2-1/1/1 up {0=ceph-08=up:active}, 1 up:standby-replay osd: 288 osds: 268 up, 268 in flags nodown,noout data: pools: 10 pools, 2545 pgs objects: 86.76 M objects, 218 TiB usage: 277 TiB used, 1.5 PiB / 1.8 PiB avail pgs: 2537 active+clean 8 active+clean+scrubbing+deep io: client: 34 MiB/s rd, 24 MiB/s wr, 954 op/s rd, 1.01 kop/s wr I will prepare a new case with info we have collected so far. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Amit Ghadge <amitg.b14@xxxxxxxxx> Sent: 20 May 2020 09:44 To: Frank Schilder Subject: Re: total ceph outage again, need help look like ceph-01 shows in starting, so I think that why command not executed and you also try to disable to scrubbing for temporary. On Wed, May 20, 2020 at 12:57 PM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote: Dear cephers, I'm sitting with a major ceph outage again. The mon/mgr hosts suffer from a packet storm of ceph traffic between ceph fs clients and the mons. No idea why this is happening. Main problem is, that I can't get through to the cluster. Admin commands hang forever: [root@gnosis ~]# ceph osd set nodown However, "ceph status" returns and shows me that I need to do something: [root@gnosis ~]# ceph status cluster: id: --- health: HEALTH_WARN 2 MDSs report slow metadata IOs 1 MDSs report slow requests 8 osds down services: mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 mgr: ceph-01(active, starting), standbys: ceph-02, ceph-03 mds: con-fs2-1/1/1 up {0=ceph-08=up:active}, 1 up:standby-replay osd: 288 osds: 208 up, 216 in; 153 remapped pgs data: pools: 10 pools, 2545 pgs objects: 86.71 M objects, 218 TiB usage: 277 TiB used, 1.5 PiB / 1.8 PiB avail pgs: 2542 active+clean 3 active+clean+scrubbing+deep io: client: 152 MiB/s rd, 72 MiB/s wr, 854 op/s rd, 796 op/s wr Is there any way to get admin commands to the mons with higher priority? Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx