Re: Ceph meltdown, need help

Frank Schilder <frans@xxxxxx> · Tue, 5 May 2020 15:30:01 +0000

Thanks! Here it is:

[root@gnosis ~]# ceph osd dump | grep require
require_min_compat_client jewel
require_osd_release mimic

It looks like we had an extremely aggressive job running on our cluster, completely flooding everything with small I/O. I think the cluster built up a huge backlog and is/was really busy trying to serve the IO. It lost beacons/heartbeats in the process or theygot too old.

Is there a way to pause client I/O?

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dan@xxxxxxxxxxxxxx>
Sent: 05 May 2020 17:25:56
To: Frank Schilder
Cc: ceph-users
Subject: Re:  Ceph meltdown, need help

Hi,

The osds are getting marked down due to this:

2020-05-05 15:18:42.893964 mon.ceph-01 mon.0 192.168.32.65:6789/0
292689 : cluster [INF] osd.40 marked down after no beacon for
903.781033 seconds
2020-05-05 15:18:42.894009 mon.ceph-01 mon.0 192.168.32.65:6789/0
292690 : cluster [INF] osd.60 marked down after no beacon for
903.780916 seconds
2020-05-05 15:18:42.894075 mon.ceph-01 mon.0 192.168.32.65:6789/0
292691 : cluster [INF] osd.170 marked down after no beacon for
903.780957 seconds
2020-05-05 15:18:42.894108 mon.ceph-01 mon.0 192.168.32.65:6789/0
292692 : cluster [INF] osd.244 marked down after no beacon for
903.780661 seconds
2020-05-05 15:18:42.894159 mon.ceph-01 mon.0 192.168.32.65:6789/0
292693 : cluster [INF] osd.283 marked down after no beacon for
903.780998 seconds

You're right to set nodown and noout, while trying to understand why
the beacon is not being sent.

Can you show the output of `ceph osd dump | grep require` ?
(I vaguely recall that after a mimic upgrade you need to flip some
switch to enable the beacon sending...)

--
Dan

On Tue, May 5, 2020 at 4:42 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Dear Dan,
>
> thank you for your fast response. Please find the log of the first OSD that went down and the ceph.log with these links:
>
> https://files.dtu.dk/u/tF1zv5zdc6mmXXO_/ceph.log?l
> https://files.dtu.dk/u/hPb5qax2-b6W9vmp/ceph-osd.2.log?l
>
> I can collect more osd logs if this helps.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Dan van der Ster <dan@xxxxxxxxxxxxxx>
> Sent: 05 May 2020 16:25:31
> To: Frank Schilder
> Cc: ceph-users
> Subject: Re:  Ceph meltdown, need help
>
> Hi Frank,
>
> Could you share any ceph-osd logs and also the ceph.log from a mon to
> see why the cluster thinks all those osds are down?
>
> Simply marking them up isn't going to help, I'm afraid.
>
> Cheers, Dan
>
>
> On Tue, May 5, 2020 at 4:12 PM Frank Schilder <frans@xxxxxx> wrote:
> >
> > Hi all,
> >
> > a lot of OSDs crashed in our cluster. Mimic 13.2.8. Current status included below. All daemons are running, no OSD process crashed. Can I start marking OSDs in and up to get them back talking to each other?
> >
> > Please advice on next steps. Thanks!!
> >
> > [root@gnosis ~]# ceph status
> >   cluster:
> >     id:     e4ece518-f2cb-4708-b00f-b6bf511e91d9
> >     health: HEALTH_WARN
> >             2 MDSs report slow metadata IOs
> >             1 MDSs report slow requests
> >             nodown,noout,norecover flag(s) set
> >             125 osds down
> >             3 hosts (48 osds) down
> >             Reduced data availability: 2221 pgs inactive, 1943 pgs down, 190 pgs peering, 13 pgs stale
> >             Degraded data redundancy: 5134396/500993581 objects degraded (1.025%), 296 pgs degraded, 299 pgs undersized
> >             9622 slow ops, oldest one blocked for 2913 sec, daemons [osd.0,osd.100,osd.101,osd.112,osd.118,osd.133,osd.136,osd.142,osd.144,osd.145]... have slow ops.
> >
> >   services:
> >     mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
> >     mgr: ceph-02(active), standbys: ceph-03, ceph-01
> >     mds: con-fs2-1/1/1 up  {0=ceph-08=up:active}, 1 up:standby-replay
> >     osd: 288 osds: 90 up, 215 in; 230 remapped pgs
> >          flags nodown,noout,norecover
> >
> >   data:
> >     pools:   10 pools, 2545 pgs
> >     objects: 62.61 M objects, 144 TiB
> >     usage:   219 TiB used, 1.6 PiB / 1.8 PiB avail
> >     pgs:     1.729% pgs unknown
> >              85.540% pgs not active
> >              5134396/500993581 objects degraded (1.025%)
> >              1796 down
> >              226  active+undersized+degraded
> >              147  down+remapped
> >              140  peering
> >              65   active+clean
> >              44   unknown
> >              38   undersized+degraded+peered
> >              38   remapped+peering
> >              17   active+undersized+degraded+remapped+backfill_wait
> >              12   stale+peering
> >              12   active+undersized+degraded+remapped+backfilling
> >              4    active+undersized+remapped
> >              2    remapped
> >              2    undersized+degraded+remapped+peered
> >              1    stale
> >              1    undersized+degraded+remapped+backfilling+peered
> >
> >   io:
> >     client:   26 KiB/s rd, 206 KiB/s wr, 21 op/s rd, 50 op/s wr
> >
> > [root@gnosis ~]# ceph health detail
> > HEALTH_WARN 2 MDSs report slow metadata IOs; 1 MDSs report slow requests; nodown,noout,norecover flag(s) set; 125 osds down; 3 hosts (48 osds) down; Reduced data availability: 2219 pgs inactive, 1943 pgs down, 188 pgs peering, 13 pgs stale; Degraded data redundancy: 5214696/500993589 objects degraded (1.041%), 298 pgs degraded, 299 pgs undersized; 9788 slow ops, oldest one blocked for 2953 sec, daemons [osd.0,osd.100,osd.101,osd.112,osd.118,osd.133,osd.136,osd.142,osd.144,osd.145]... have slow ops.
> > MDS_SLOW_METADATA_IO 2 MDSs report slow metadata IOs
> >     mdsceph-08(mds.0): 100+ slow metadata IOs are blocked > 30 secs, oldest blocked for 2940 secs
> >     mdsceph-12(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 2942 secs
> > MDS_SLOW_REQUEST 1 MDSs report slow requests
> >     mdsceph-08(mds.0): 100 slow requests are blocked > 30 secs
> > OSDMAP_FLAGS nodown,noout,norecover flag(s) set
> > OSD_DOWN 125 osds down
> >     osd.0 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-21) is down
> >     osd.6 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.7 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.8 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-11) is down
> >     osd.16 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-08) is down
> >     osd.18 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.19 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-11) is down
> >     osd.21 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-13) is down
> >     osd.31 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-18) is down
> >     osd.37 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-04) is down
> >     osd.38 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-07) is down
> >     osd.48 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-04) is down
> >     osd.51 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-22) is down
> >     osd.53 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-21) is down
> >     osd.55 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-19) is down
> >     osd.62 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-17) is down
> >     osd.67 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-11) is down
> >     osd.72 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-21) is down
> >     osd.75 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-08) is down
> >     osd.78 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.79 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-11) is down
> >     osd.80 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.81 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-13) is down
> >     osd.82 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-14) is down
> >     osd.83 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-15) is down
> >     osd.88 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-08) is down
> >     osd.89 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.92 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-13) is down
> >     osd.93 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.95 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-15) is down
> >     osd.96 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-16) is down
> >     osd.97 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-17) is down
> >     osd.100 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-13) is down
> >     osd.104 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.105 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-13) is down
> >     osd.107 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-15) is down
> >     osd.108 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-17) is down
> >     osd.109 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-16) is down
> >     osd.111 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-14) is down
> >     osd.113 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.114 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-09) is down
> >     osd.116 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.117 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-13) is down
> >     osd.119 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-15) is down
> >     osd.122 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.123 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-15) is down
> >     osd.124 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-08) is down
> >     osd.125 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-09) is down
> >     osd.126 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.128 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.131 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-15) is down
> >     osd.134 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-13) is down
> >     osd.139 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.140 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.141 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-13) is down
> >     osd.145 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-04) is down
> >     osd.149 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.151 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-09) is down
> >     osd.152 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.153 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-13) is down
> >     osd.154 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-14) is down
> >     osd.155 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-15) is down
> >     osd.156 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-04) is down
> >     osd.157 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-05) is down
> >     osd.159 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-07) is down
> >     osd.161 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-09) is down
> >     osd.162 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.164 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.165 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-13) is down
> >     osd.166 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-15) is down
> >     osd.167 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-14) is down
> >     osd.171 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-08) is down
> >     osd.172 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-07) is down
> >     osd.174 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.176 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.177 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-13) is down
> >     osd.179 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-15) is down
> >     osd.182 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-06) is down
> >     osd.183 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-07) is down
> >     osd.184 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-08) is down
> >     osd.186 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.187 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-11) is down
> >     osd.190 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-14) is down
> >     osd.191 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-15) is down
> >     osd.194 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-16) is down
> >     osd.195 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-17) is down
> >     osd.196 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-16) is down
> >     osd.199 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-17) is down
> >     osd.200 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-16) is down
> >     osd.201 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-17) is down
> >     osd.202 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-16) is down
> >     osd.203 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-17) is down
> >     osd.204 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-16) is down
> >     osd.208 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-08) is down
> >     osd.210 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-08) is down
> >     osd.212 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.213 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-11) is down
> >     osd.214 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-09) is down
> >     osd.215 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-10) is down
> >     osd.216 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-11) is down
> >     osd.218 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-09) is down
> >     osd.219 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-11) is down
> >     osd.221 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-12) is down
> >     osd.224 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-16) is down
> >     osd.226 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1,host=ceph-17) is down
> >     osd.228 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-20) is down
> >     osd.230 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-20) is down
> >     osd.233 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-19) is down
> >     osd.236 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-19) is down
> >     osd.238 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-18) is down
> >     osd.247 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-21) is down
> >     osd.248 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-18) is down
> >     osd.254 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-04) is down
> >     osd.256 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-04) is down
> >     osd.259 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-18) is down
> >     osd.260 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-20) is down
> >     osd.262 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-19) is down
> >     osd.266 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-18) is down
> >     osd.267 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-18) is down
> >     osd.272 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-20) is down
> >     osd.274 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-21) is down
> >     osd.275 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-19) is down
> >     osd.276 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-22) is down
> >     osd.281 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-22) is down
> >     osd.285 (root=DTU,region=Risoe,datacenter=ServerRoom,room=SR-113,host=ceph-05) is down
> > OSD_HOST_DOWN 3 hosts (48 osds) down
> >     host ceph-11 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1) (16 osds) is down
> >     host ceph-10 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1) (16 osds) is down
> >     host ceph-13 (root=DTU,region=Risoe,datacenter=ContainerSquare,room=CON-161A1) (16 osds) is down
> > PG_AVAILABILITY Reduced data availability: 2219 pgs inactive, 1943 pgs down, 188 pgs peering, 13 pgs stale
> >     pg 14.513 is stuck inactive for 1681.564244, current state down, last acting [2147483647,2147483647,2147483647,2147483647,2147483647,143,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.514 is down, acting [193,2147483647,2147483647,2147483647,2147483647,118,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.515 is down, acting [2147483647,2147483647,2147483647,211,133,135,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.516 is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,205,2147483647]
> >     pg 14.517 is down, acting [2147483647,2147483647,5,2147483647,2147483647,2147483647,2147483647,2147483647,61,112]
> >     pg 14.518 is down, acting [2147483647,198,2147483647,2147483647,2147483647,2147483647,4,185,2147483647,2147483647]
> >     pg 14.519 is down, acting [2147483647,2147483647,68,2147483647,2147483647,2147483647,2147483647,185,2147483647,94]
> >     pg 14.51a is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,101,2147483647]
> >     pg 14.51b is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,197,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.51c is down, acting [193,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,197]
> >     pg 14.51d is down, acting [2147483647,2147483647,61,2147483647,77,2147483647,2147483647,2147483647,112,2147483647]
> >     pg 14.51e is down, acting [2147483647,2147483647,2147483647,2147483647,112,2147483647,2147483647,193,2147483647,2147483647]
> >     pg 14.51f is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,94,2147483647,2147483647]
> >     pg 14.520 is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,207,2147483647,101,133,2147483647]
> >     pg 14.521 is down, acting [205,2147483647,133,2147483647,2147483647,2147483647,2147483647,4,2147483647,193]
> >     pg 14.522 is down, acting [101,2147483647,2147483647,11,197,2147483647,136,94,2147483647,2147483647]
> >     pg 14.523 is down, acting [2147483647,2147483647,2147483647,118,2147483647,71,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.524 is down, acting [2147483647,111,2147483647,2147483647,2147483647,8,2147483647,112,2147483647,2147483647]
> >     pg 14.525 is down, acting [2147483647,2147483647,2147483647,142,2147483647,61,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.526 is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,61,193,2147483647,2147483647,2147483647]
> >     pg 14.527 is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,109,2147483647,2147483647]
> >     pg 14.528 is down, acting [2147483647,133,2147483647,2147483647,2147483647,2147483647,4,2147483647,2147483647,2147483647]
> >     pg 14.529 is down, acting [2147483647,112,2147483647,2147483647,2147483647,2147483647,185,2147483647,118,2147483647]
> >     pg 14.52a is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,136,2147483647,135,2147483647,2147483647]
> >     pg 14.52b is down, acting [2147483647,2147483647,2147483647,112,142,211,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.52c is down, acting [185,2147483647,198,2147483647,118,2147483647,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.52d is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,5,2147483647,2147483647,2147483647]
> >     pg 14.52e is down, acting [71,101,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,142,2147483647]
> >     pg 14.52f is down, acting [198,2147483647,2147483647,2147483647,2147483647,11,2147483647,2147483647,118,2147483647]
> >     pg 14.530 is down, acting [142,2147483647,2147483647,2147483647,133,2147483647,2147483647,2147483647,2147483647,112]
> >     pg 14.531 is down, acting [2147483647,142,2147483647,2147483647,2147483647,185,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.532 is down, acting [135,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,136,118]
> >     pg 14.533 is down, acting [2147483647,77,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.534 is down, acting [2147483647,2147483647,2147483647,185,118,2147483647,2147483647,207,2147483647,2147483647]
> >     pg 14.535 is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,136,142,133,2147483647]
> >     pg 14.536 is down, acting [2147483647,11,2147483647,2147483647,136,2147483647,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.537 is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,77,2147483647]
> >     pg 14.538 is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,205,2147483647,2147483647]
> >     pg 14.539 is down, acting [2147483647,2147483647,2147483647,198,2147483647,2147483647,4,2147483647,2147483647,2147483647]
> >     pg 14.53a is down, acting [2147483647,11,136,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.53b is down, acting [2147483647,2147483647,2147483647,2147483647,112,2147483647,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.53c is down, acting [2147483647,2147483647,2147483647,71,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.53d is down, acting [2147483647,2147483647,2147483647,185,2147483647,2147483647,2147483647,2147483647,2147483647,136]
> >     pg 14.53e is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,112,185]
> >     pg 14.53f is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,185,2147483647,2147483647,2147483647]
> >     pg 14.540 is down, acting [205,2147483647,2147483647,2147483647,2147483647,2147483647,142,2147483647,112,77]
> >     pg 14.541 is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,197,211,2147483647,2147483647,2147483647]
> >     pg 14.542 is down, acting [112,2147483647,101,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.543 is down, acting [111,2147483647,2147483647,2147483647,2147483647,101,2147483647,2147483647,2147483647,2147483647]
> >     pg 14.544 is down, acting [4,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,205]
> >     pg 14.545 is down, acting [2147483647,2147483647,2147483647,2147483647,2147483647,142,5,2147483647,2147483647,2147483647]
> > PG_DEGRADED Degraded data redundancy: 5214696/500993589 objects degraded (1.041%), 298 pgs degraded, 299 pgs undersized
> >     pg 1.29 is stuck undersized for 2075.633328, current state active+undersized+degraded, last acting [253,258]
> >     pg 1.2a is stuck undersized for 1642.864920, current state active+undersized+degraded, last acting [252,255]
> >     pg 1.2b is stuck undersized for 2355.149928, current state active+undersized+degraded+remapped+backfill_wait, last acting [240,268]
> >     pg 1.2c is stuck undersized for 1459.277329, current state active+undersized+degraded, last acting [241,273]
> >     pg 1.2d is stuck undersized for 803.339131, current state undersized+degraded+peered, last acting [282]
> >     pg 2.25 is active+undersized+degraded, acting [253,2147483647,2147483647,258,261,273,277,243]
> >     pg 2.28 is stuck undersized for 803.340163, current state active+undersized+degraded, last acting [282,241,246,2147483647,273,252,2147483647,268]
> >     pg 2.29 is stuck undersized for 803.341160, current state active+undersized+degraded, last acting [240,258,277,264,2147483647,2147483647,271,250]
> >     pg 2.2a is stuck undersized for 1447.684978, current state active+undersized+degraded+remapped+backfilling, last acting [252,270,2147483647,261,2147483647,255,287,264]
> >     pg 2.2e is stuck undersized for 2030.849944, current state active+undersized+degraded, last acting [264,2147483647,251,245,257,286,261,258]
> >     pg 2.51 is stuck undersized for 1459.274671, current state active+undersized+degraded+remapped+backfilling, last acting [270,2147483647,2147483647,265,241,243,240,252]
> >     pg 2.52 is stuck undersized for 2030.850897, current state active+undersized+degraded+remapped+backfilling, last acting [240,2147483647,270,265,269,280,278,2147483647]
> >     pg 2.53 is stuck undersized for 1459.273517, current state active+undersized+degraded, last acting [261,2147483647,280,282,2147483647,245,243,241]
> >     pg 2.61 is stuck undersized for 2075.633140, current state active+undersized+degraded+remapped+backfilling, last acting [269,2147483647,258,286,270,255,2147483647,264]
> >     pg 2.62 is stuck undersized for 803.340577, current state active+undersized+degraded, last acting [2147483647,253,258,2147483647,250,287,264,284]
> >     pg 2.66 is stuck undersized for 803.341231, current state active+undersized+degraded, last acting [264,280,265,255,257,269,2147483647,270]
> >     pg 2.6c is stuck undersized for 963.369539, current state active+undersized+degraded, last acting [286,269,278,251,2147483647,273,2147483647,280]
> >     pg 2.70 is stuck undersized for 873.662725, current state active+undersized+degraded, last acting [2147483647,268,255,273,253,265,278,2147483647]
> >     pg 2.74 is stuck undersized for 2075.632312, current state active+undersized+degraded+remapped+backfilling, last acting [240,242,2147483647,245,243,269,2147483647,265]
> >     pg 3.24 is stuck undersized for 1570.800184, current state active+undersized+degraded, last acting [235,263]
> >     pg 3.25 is stuck undersized for 733.673503, current state undersized+degraded+peered, last acting [232]
> >     pg 3.28 is stuck undersized for 2610.307886, current state active+undersized+degraded, last acting [263,84]
> >     pg 3.2a is stuck undersized for 1214.710839, current state active+undersized+degraded, last acting [181,232]
> >     pg 3.2b is stuck undersized for 2075.630671, current state active+undersized+degraded, last acting [63,144]
> >     pg 3.52 is stuck undersized for 1570.777598, current state active+undersized+degraded, last acting [158,237]
> >     pg 3.54 is stuck undersized for 1350.257189, current state active+undersized+degraded, last acting [239,74]
> >     pg 3.55 is stuck undersized for 2592.642531, current state active+undersized+degraded, last acting [157,233]
> >     pg 3.5a is stuck undersized for 2075.608257, current state undersized+degraded+peered, last acting [168]
> >     pg 3.5c is stuck undersized for 733.674836, current state active+undersized+degraded, last acting [263,234]
> >     pg 3.5d is stuck undersized for 2610.307220, current state active+undersized+degraded, last acting [180,84]
> >     pg 3.5e is stuck undersized for 1710.756037, current state undersized+degraded+peered, last acting [146]
> >     pg 3.61 is stuck undersized for 1080.210021, current state active+undersized+degraded, last acting [168,239]
> >     pg 3.62 is stuck undersized for 831.217622, current state active+undersized+degraded, last acting [84,263]
> >     pg 3.63 is stuck undersized for 733.674204, current state active+undersized+degraded, last acting [263,232]
> >     pg 3.65 is stuck undersized for 1570.790824, current state active+undersized+degraded, last acting [63,84]
> >     pg 3.66 is stuck undersized for 733.682973, current state undersized+degraded+peered, last acting [63]
> >     pg 3.68 is stuck undersized for 1570.624462, current state active+undersized+degraded, last acting [229,148]
> >     pg 3.69 is stuck undersized for 1350.316213, current state undersized+degraded+peered, last acting [235]
> >     pg 3.6b is stuck undersized for 783.813654, current state undersized+degraded+peered, last acting [63]
> >     pg 3.6c is stuck undersized for 783.819083, current state undersized+degraded+peered, last acting [229]
> >     pg 3.6f is stuck undersized for 2610.321349, current state active+undersized+degraded, last acting [232,158]
> >     pg 3.72 is stuck undersized for 1350.358149, current state active+undersized+degraded, last acting [229,74]
> >     pg 3.73 is stuck undersized for 1570.788310, current state undersized+degraded+peered, last acting [234]
> >     pg 11.20 is stuck undersized for 733.682510, current state active+undersized+degraded, last acting [2147483647,239,87,2147483647,158,237,63,76]
> >     pg 11.26 is stuck undersized for 1914.334332, current state active+undersized+degraded, last acting [2147483647,237,2147483647,263,158,148,181,180]
> >     pg 11.2d is stuck undersized for 1350.365988, current state active+undersized+degraded, last acting [2147483647,2147483647,73,229,86,158,169,84]
> >     pg 11.54 is stuck undersized for 1914.398125, current state active+undersized+degraded, last acting [231,169,2147483647,229,84,85,237,63]
> >     pg 11.5b is stuck undersized for 2047.980719, current state active+undersized+degraded, last acting [86,237,168,263,144,1,229,2147483647]
> >     pg 11.5e is stuck undersized for 873.643661, current state active+undersized+degraded, last acting [181,2147483647,229,158,231,1,169,2147483647]
> >     pg 11.62 is stuck undersized for 1144.491696, current state active+undersized+degraded, last acting [2147483647,85,235,74,63,234,181,2147483647]
> >     pg 11.6f is stuck undersized for 873.646628, current state active+undersized+degraded, last acting [234,3,2147483647,158,180,63,2147483647,181]
> > SLOW_OPS 9788 slow ops, oldest one blocked for 2953 sec, daemons [osd.0,osd.100,osd.101,osd.112,osd.118,osd.133,osd.136,osd.142,osd.144,osd.145]... have slow ops.
> >
> >
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx