Hello.
I'm doing small experimental setup. I have two hosts with few OSD, one
OSD has been put down intentionaly, but I regardless the second (alive)
OSD on different host, I see that all IO (rbd, and even rados get) hung
for long time (more than 30 minutes already).
My configuration:
-9 2.00000 root ssd
-11 1.00000 host ssd-pp7
9 1.00000 osd.9 down 0 1.00000
-12 1.00000 host ssd-pp11
1 0.25000 osd.1 up 1.00000 1.00000
2 0.25000 osd.2 up 1.00000 1.00000
3 0.25000 osd.3 up 1.00000 1.00000
11 0.25000 osd.11 up 1.00000 1.00000
pg map shows that acting OSD was moved from '9' to others.
ceph health detail
HEALTH_ERR 5 pgs are stuck inactive for more than 300 seconds; 5 pgs
degraded; 5 pgs stuck inactive; 8 pgs stuck unclean; 5 pgs undersized;
53 requests are blocked > 32 sec; 2 osds have slow requests; recovery
2538/8200 objects degraded (30.951%); recovery 1562/8200 objects
misplaced (19.049%); too few PGs per OSD (1 < min 30)
pg 26.0 is stuck inactive for 1429.756078, current state
undersized+degraded+peered, last acting [1]
pg 26.7 is stuck inactive for 1429.751221, current state
undersized+degraded+peered, last acting [2]
pg 26.2 is stuck inactive for 1429.749713, current state
undersized+degraded+peered, last acting [1]
pg 26.6 is stuck inactive for 1429.763065, current state
undersized+degraded+peered, last acting [2]
pg 26.5 is stuck inactive for 1429.754325, current state
undersized+degraded+peered, last acting [1]
pg 26.0 is stuck unclean for 1429.756101, current state
undersized+degraded+peered, last acting [1]
pg 26.1 is stuck unclean for 1429.778469, current state active+remapped,
last acting [11,3]
pg 26.2 is stuck unclean for 1429.749733, current state
undersized+degraded+peered, last acting [1]
pg 26.3 is stuck unclean for 1429.796471, current state active+remapped,
last acting [1,2]
pg 26.4 is stuck unclean for 1429.762425, current state active+remapped,
last acting [1,3]
pg 26.5 is stuck unclean for 1429.754349, current state
undersized+degraded+peered, last acting [1]
pg 26.6 is stuck unclean for 1429.763094, current state
undersized+degraded+peered, last acting [2]
pg 26.7 is stuck unclean for 1429.751259, current state
undersized+degraded+peered, last acting [2]
root@pp11:~# ceph osd pool stats ssd
pool ssd id 26
nothing is going on
mons are in quorum (all up)
osd dump:
osd.9 down out weight 0 up_from 1055 up_thru 1085 down_at 1089
last_clean_interval [1017,1052) 78.140.137.210:6800/29731
78.140.137.210:6801/29731 78.140.137.210:6802/29731
78.140.137.210:6803/29731 autoout,exists
2fc49cd5-e48c-4189-a67b-229d09378d1c
What should normally happens in this situation and why it no happen?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com