Re: Upgrading 2K OSDs from Hammer to Jewel. Our experience

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 11, 2017 at 12:21 PM, <cephmailinglist@xxxxxxxxx> wrote:
>
> The next and biggest problem we encountered had to do with the CRC errors on the OSD map. On every map update, the OSDs that were not upgraded yet, got that CRC error and asked the monitor for a full OSD map instead of just a delta update. At first we did not understand what exactly happened, we ran the upgrade per node using a script and in that script we watch the state of the cluster and when the cluster is healthy again, we upgrade the next host. Every time we started the script (skipping the already upgraded hosts) the first host(s) upgraded without issues and then we got blocked I/O on the cluster. The blocked I/O went away within a minute of 2 (not measured). After investigation we found out that the blocked I/O happened when nodes where asking the monitor for a (full) OSD map and that resulted shortly in a full saturated network link on our monitor.


Thanks for the detailed upgrade report. I wanted to zoom in on this
CRC/fullmap issue because it could be quite disruptive for us when we
upgrade from hammer to jewel.

I've read various reports that the fool proof way to avoid the full
map DoS would be to upgrade all OSDs to jewel before the mon's.
Did anyone have success with that workaround? I'm cc'ing Bryan because
he knows this issue very well.

Cheers, Dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux