Hi,
The host that is taken down has 12 disks in it?
Have a look at the down PG's '18 pgs down' - I suspect this will be what is causing the I/O freeze.
Is your cursh map setup correctly to split data over different hosts?
Thanks
On Tue, Sep 13, 2016 at 11:45 AM, Daznis <daznis@xxxxxxxxx> wrote:
No, no errors about that. I have set noout before it happened, but it
still started recovery. I have added
nobackfill,norebalance,norecover,noscrub,nodeep-scrub once i noticed
it started doing crazy stuff. So recovery I/O stopped but the cluster
can't read any info. Only writes to cache layer.
cluster cdca2074-4c91-4047-a607-faebcbc1ee17
health HEALTH_WARN
2225 pgs degraded
18 pgs down
18 pgs peering
89 pgs stale
2225 pgs stuck degraded
18 pgs stuck inactive
89 pgs stuck stale
2257 pgs stuck unclean
2225 pgs stuck undersized
2225 pgs undersized
recovery 4180820/11837906 objects degraded (35.317%)
recovery 24016/11837906 objects misplaced (0.203%)
12/39 in osds are down
noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
flag(s) set
monmap e9: 7 mons at {}
election epoch 170, quorum 0,1,2,3,4,5,6
osdmap e40290: 40 osds: 27 up, 39 in; 14 remapped pgs
flags noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub
pgmap v39326300: 4096 pgs, 4 pools, 21455 GB data, 5780 kobjects
42407 GB used, 75772 GB / 115 TB avail
4180820/11837906 objects degraded (35.317%)
24016/11837906 objects misplaced (0.203%)
2136 active+undersized+degraded
1837 active+clean
89 stale+active+undersized+degraded
18 down+peering
14 active+remapped
2 active+clean+scrubbing+deep
client io 0 B/s rd, 9509 kB/s wr, 3469 op/s
On Tue, Sep 13, 2016 at 1:34 PM, M Ranga Swami Reddy
<swamireddy@xxxxxxxxx> wrote:
> Please check if any osd is nearfull ERR. Can you please share the ceph -s
> o/p?
>
> Thanks
> Swami
>
> On Tue, Sep 13, 2016 at 3:54 PM, Daznis <daznis@xxxxxxxxx> wrote:
>>
>> Hello,
>>
>>
>> I have encountered a strange I/O freeze while rebooting one OSD node
>> for maintenance purpose. It was one of the 3 Nodes in the entire
>> cluster. Before this rebooting or shutting down and entire node just
>> slowed down the ceph, but not completely froze it.
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com