Guys,
After the update to 0.94.7 (from 0.94.6) everytime I replaced a broken OSD (1 out of 300) I get flooded by "[WRN] failed to encode map eXXX with expected crc", and the amount of blocked requests (> 32 secs) increase drastically, consequently killing all
radosgw sessions.
Nothing changed in our cluster expect the version update, and before that, we never had any issues like that, the cluster was able to handle disks replacement quite well.
The procedure used for the OSD replacement is the following:
Removing the dead disk:
ceph osd out <id>
ceph osd crush remove osd.<id>
—> here the problem starts
ceph osd rm <id>
ceph auth del osd.<id>
Adding a new OSD:
ceph-deploy disk zap <node>:/dev/<disk>
ceph-deploy --overwrite-conf osd prepare <node>:<disk>:/dev/<journal partition>
ceph-deploy --overwrite-conf osd activate <node>:<disk>:/dev/<journal partition>
Warning flood messages:
cluster xxx
health HEALTH_WARN
97 pgs backfill
12 pgs backfilling
3 pgs peering
2 pgs stuck inactive
112 pgs stuck unclean
242 requests are blocked > 32 sec
recovery 111320/18148458 objects misplaced (0.613%)
monmap e1: 3 mons at {mon001=xxx:6789/0,mon002=xxx:6789/0,mon003=xxx:6789/0}
election epoch 526, quorum 0,1,2 mon001,mon002,mon003
osdmap e134086: 296 osds: 296 up, 296 in; 108 remapped pgs
pgmap v12721457: 18368 pgs, 15 pools, 17163 GB data, 5889 kobjects
55811 GB used, 397 TB / 451 TB avail
111320/18148458 objects misplaced (0.613%)
18254 active+clean
97 active+remapped+wait_backfill
12 active+remapped+backfilling
3 peering
1 active+clean+scrubbing+deep
1 active+clean+scrubbing
recovery io 42311 kB/s, 15 objects/s
client io 5205 B/s rd, 6 op/s
2016-06-02 11:22:56.319615 osd.43 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.320236 osd.21 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.320862 osd.60 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.322256 osd.21 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.322833 osd.60 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.324521 osd.21 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.324533 osd.60 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.326382 osd.21 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.326716 osd.60 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.328460 osd.60 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.328500 osd.21 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.330503 osd.60 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.330517 osd.43 [WRN] failed to encode map e134066 with expected crc
2016-06-02 11:22:56.330671 osd.21 [WRN] failed to encode map e134066 with expected crc
Kind regards,
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com