Hi Daznis... Something is not quite right. You have pools with 2 replicas (right?). The fact that you have 18 down pgs says that both the OSDS acting on those pgs are with problems. You should try to understand which PGs are down and which OSDs are acting on them ('ceph pg dump_stuck' or 'ceph health detail' should give you that info). Maybe you find from there what is the other / or others problematic OSDs. Try to then 'ceph pg <id> query' and see if you get any further info. Cheers Goncalo ________________________________________ From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Daznis [daznis@xxxxxxxxx] Sent: 13 September 2016 21:10 To: Sean Redmond Cc: ceph-users Subject: Re: I/O freeze while a single node is down. Yes that one has +2 OSD's on it. root default { id -1 # do not change unnecessarily # weight 116.480 alg straw hash 0 # rjenkins1 item OSD-1 weight 36.400 item OSD-2 weight 36.400 item OSD-3 weight 43.680 } rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } On Tue, Sep 13, 2016 at 1:51 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> wrote: > Hi, > > The host that is taken down has 12 disks in it? > > Have a look at the down PG's '18 pgs down' - I suspect this will be what is > causing the I/O freeze. > > Is your cursh map setup correctly to split data over different hosts? > > Thanks > > On Tue, Sep 13, 2016 at 11:45 AM, Daznis <daznis@xxxxxxxxx> wrote: >> >> No, no errors about that. I have set noout before it happened, but it >> still started recovery. I have added >> nobackfill,norebalance,norecover,noscrub,nodeep-scrub once i noticed >> it started doing crazy stuff. So recovery I/O stopped but the cluster >> can't read any info. Only writes to cache layer. >> >> cluster cdca2074-4c91-4047-a607-faebcbc1ee17 >> health HEALTH_WARN >> 2225 pgs degraded >> 18 pgs down >> 18 pgs peering >> 89 pgs stale >> 2225 pgs stuck degraded >> 18 pgs stuck inactive >> 89 pgs stuck stale >> 2257 pgs stuck unclean >> 2225 pgs stuck undersized >> 2225 pgs undersized >> recovery 4180820/11837906 objects degraded (35.317%) >> recovery 24016/11837906 objects misplaced (0.203%) >> 12/39 in osds are down >> noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub >> flag(s) set >> monmap e9: 7 mons at {} >> election epoch 170, quorum 0,1,2,3,4,5,6 >> osdmap e40290: 40 osds: 27 up, 39 in; 14 remapped pgs >> flags >> noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub >> pgmap v39326300: 4096 pgs, 4 pools, 21455 GB data, 5780 kobjects >> 42407 GB used, 75772 GB / 115 TB avail >> 4180820/11837906 objects degraded (35.317%) >> 24016/11837906 objects misplaced (0.203%) >> 2136 active+undersized+degraded >> 1837 active+clean >> 89 stale+active+undersized+degraded >> 18 down+peering >> 14 active+remapped >> 2 active+clean+scrubbing+deep >> client io 0 B/s rd, 9509 kB/s wr, 3469 op/s >> >> On Tue, Sep 13, 2016 at 1:34 PM, M Ranga Swami Reddy >> <swamireddy@xxxxxxxxx> wrote: >> > Please check if any osd is nearfull ERR. Can you please share the ceph >> > -s >> > o/p? >> > >> > Thanks >> > Swami >> > >> > On Tue, Sep 13, 2016 at 3:54 PM, Daznis <daznis@xxxxxxxxx> wrote: >> >> >> >> Hello, >> >> >> >> >> >> I have encountered a strange I/O freeze while rebooting one OSD node >> >> for maintenance purpose. It was one of the 3 Nodes in the entire >> >> cluster. Before this rebooting or shutting down and entire node just >> >> slowed down the ceph, but not completely froze it. >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@xxxxxxxxxxxxxx >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com