Hi,
Are you sure all OSDs have been updated to 0.94.7? Those messages should only be printed by 0.94.6 OSDs trying to handle messages from a 0.94.7 ceph-mon.
Also, see the thread about the 0.94.7 release -- I mentioned a workaround there.
--
Dan
Dan
On Thu, Jun 2, 2016 at 11:29 AM, Romero Junior <r.junior@xxxxxxxxxxxxxxxxxxx> wrote:
Guys,
After the update to 0.94.7 (from 0.94.6) everytime I replaced a broken OSD (1 out of 300) I get flooded by "[WRN] failed to encode map eXXX with expected crc", and the amount of blocked requests (> 32 secs) increase drastically, consequently killing all radosgw sessions.
Nothing changed in our cluster expect the version update, and before that, we never had any issues like that, the cluster was able to handle disks replacement quite well.
The procedure used for the OSD replacement is the following:
Removing the dead disk:
ceph osd out <id>ceph osd crush remove osd.<id>—> here the problem startsceph osd rm <id>ceph auth del osd.<id>
Adding a new OSD:
ceph-deploy disk zap <node>:/dev/<disk>ceph-deploy --overwrite-conf osd prepare <node>:<disk>:/dev/<journal partition>ceph-deploy --overwrite-conf osd activate <node>:<disk>:/dev/<journal partition>
Warning flood messages:
cluster xxxhealth HEALTH_WARN97 pgs backfill12 pgs backfilling3 pgs peering2 pgs stuck inactive112 pgs stuck unclean242 requests are blocked > 32 secrecovery 111320/18148458 objects misplaced (0.613%)monmap e1: 3 mons at {mon001=xxx:6789/0,mon002=xxx:6789/0,mon003=xxx:6789/0}election epoch 526, quorum 0,1,2 mon001,mon002,mon003osdmap e134086: 296 osds: 296 up, 296 in; 108 remapped pgspgmap v12721457: 18368 pgs, 15 pools, 17163 GB data, 5889 kobjects55811 GB used, 397 TB / 451 TB avail111320/18148458 objects misplaced (0.613%)18254 active+clean97 active+remapped+wait_backfill12 active+remapped+backfilling3 peering1 active+clean+scrubbing+deep1 active+clean+scrubbingrecovery io 42311 kB/s, 15 objects/sclient io 5205 B/s rd, 6 op/s
2016-06-02 11:22:56.319615 osd.43 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.320236 osd.21 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.320862 osd.60 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.322256 osd.21 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.322833 osd.60 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.324521 osd.21 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.324533 osd.60 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.326382 osd.21 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.326716 osd.60 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.328460 osd.60 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.328500 osd.21 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.330503 osd.60 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.330517 osd.43 [WRN] failed to encode map e134066 with expected crc2016-06-02 11:22:56.330671 osd.21 [WRN] failed to encode map e134066 with expected crc
Kind regards,
Romero Junior
DevOps Infra Engineer
LeaseWeb Global Services B.V.
T: +31 20 316 0230
M: +31 6 2115 9310
E: r.junior@xxxxxxxxxxxxxxxxxxx
W: www.leaseweb.com
Luttenbergweg 8, 1101 EC Amsterdam, Netherlands
LeaseWeb is the brand name under which the various independent LeaseWeb companies operate. Each company is a separate and distinct entity that provides services in a particular geographic area. LeaseWeb Global Services B.V. does not provide third-party services. Please see www.leaseweb.com/en/legal for more information.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com