Yes that one has +2 OSD's on it. root default { id -1 # do not change unnecessarily # weight 116.480 alg straw hash 0 # rjenkins1 item OSD-1 weight 36.400 item OSD-2 weight 36.400 item OSD-3 weight 43.680 } rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } On Tue, Sep 13, 2016 at 1:51 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> wrote: > Hi, > > The host that is taken down has 12 disks in it? > > Have a look at the down PG's '18 pgs down' - I suspect this will be what is > causing the I/O freeze. > > Is your cursh map setup correctly to split data over different hosts? > > Thanks > > On Tue, Sep 13, 2016 at 11:45 AM, Daznis <daznis@xxxxxxxxx> wrote: >> >> No, no errors about that. I have set noout before it happened, but it >> still started recovery. I have added >> nobackfill,norebalance,norecover,noscrub,nodeep-scrub once i noticed >> it started doing crazy stuff. So recovery I/O stopped but the cluster >> can't read any info. Only writes to cache layer. >> >> cluster cdca2074-4c91-4047-a607-faebcbc1ee17 >> health HEALTH_WARN >> 2225 pgs degraded >> 18 pgs down >> 18 pgs peering >> 89 pgs stale >> 2225 pgs stuck degraded >> 18 pgs stuck inactive >> 89 pgs stuck stale >> 2257 pgs stuck unclean >> 2225 pgs stuck undersized >> 2225 pgs undersized >> recovery 4180820/11837906 objects degraded (35.317%) >> recovery 24016/11837906 objects misplaced (0.203%) >> 12/39 in osds are down >> noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub >> flag(s) set >> monmap e9: 7 mons at {} >> election epoch 170, quorum 0,1,2,3,4,5,6 >> osdmap e40290: 40 osds: 27 up, 39 in; 14 remapped pgs >> flags >> noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub >> pgmap v39326300: 4096 pgs, 4 pools, 21455 GB data, 5780 kobjects >> 42407 GB used, 75772 GB / 115 TB avail >> 4180820/11837906 objects degraded (35.317%) >> 24016/11837906 objects misplaced (0.203%) >> 2136 active+undersized+degraded >> 1837 active+clean >> 89 stale+active+undersized+degraded >> 18 down+peering >> 14 active+remapped >> 2 active+clean+scrubbing+deep >> client io 0 B/s rd, 9509 kB/s wr, 3469 op/s >> >> On Tue, Sep 13, 2016 at 1:34 PM, M Ranga Swami Reddy >> <swamireddy@xxxxxxxxx> wrote: >> > Please check if any osd is nearfull ERR. Can you please share the ceph >> > -s >> > o/p? >> > >> > Thanks >> > Swami >> > >> > On Tue, Sep 13, 2016 at 3:54 PM, Daznis <daznis@xxxxxxxxx> wrote: >> >> >> >> Hello, >> >> >> >> >> >> I have encountered a strange I/O freeze while rebooting one OSD node >> >> for maintenance purpose. It was one of the 3 Nodes in the entire >> >> cluster. Before this rebooting or shutting down and entire node just >> >> slowed down the ceph, but not completely froze it. >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@xxxxxxxxxxxxxx >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com