Re: total ceph outage again, need help

Frank Schilder <frans@xxxxxx> · Wed, 20 May 2020 07:49:54 +0000

Looks like the immediate danger has passed by:

[root@gnosis ~]# ceph status
  cluster:
    id:     e4ece518-f2cb-4708-b00f-b6bf511e91d9
    health: HEALTH_WARN
            nodown,noout flag(s) set
            735 slow ops, oldest one blocked for 3573 sec, daemons [mon.ceph-02,mon.ceph-03] have slow ops.

  services:
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
    mgr: ceph-01(active), standbys: ceph-03, ceph-02
    mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
    osd: 288 osds: 268 up, 268 in
         flags nodown,noout

  data:
    pools:   10 pools, 2545 pgs
    objects: 86.76 M objects, 218 TiB
    usage:   277 TiB used, 1.5 PiB / 1.8 PiB avail
    pgs:     2537 active+clean
             8    active+clean+scrubbing+deep

  io:
    client:   34 MiB/s rd, 24 MiB/s wr, 954 op/s rd, 1.01 kop/s wr

I will prepare a new case with info we have collected so far.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Amit Ghadge <amitg.b14@xxxxxxxxx>
Sent: 20 May 2020 09:44
To: Frank Schilder
Subject: Re:  total ceph outage again, need help

look like  ceph-01 shows in starting, so I think that why command not executed and you also try to disable to scrubbing for temporary.

On Wed, May 20, 2020 at 12:57 PM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote:
Dear cephers,

I'm sitting with a major ceph outage again. The mon/mgr hosts suffer from a packet storm of ceph traffic between ceph fs clients and the mons. No idea why this is happening.

Main problem is, that I can't get through to the cluster. Admin commands hang forever:

[root@gnosis ~]# ceph osd set nodown

However, "ceph status" returns and shows me that I need to do something:

[root@gnosis ~]# ceph status
  cluster:
    id:     ---
    health: HEALTH_WARN
            2 MDSs report slow metadata IOs
            1 MDSs report slow requests
            8 osds down

  services:
    mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
    mgr: ceph-01(active, starting), standbys: ceph-02, ceph-03
    mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
    osd: 288 osds: 208 up, 216 in; 153 remapped pgs

  data:
    pools:   10 pools, 2545 pgs
    objects: 86.71 M objects, 218 TiB
    usage:   277 TiB used, 1.5 PiB / 1.8 PiB avail
    pgs:     2542 active+clean
             3    active+clean+scrubbing+deep

  io:
    client:   152 MiB/s rd, 72 MiB/s wr, 854 op/s rd, 796 op/s wr

Is there any way to get admin commands to the mons with higher priority?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx