Re: [Jewel] upgrade 10.2.3 => 10.2.5 KO : first OSD server freeze every two days :)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



First of all, don't do a ceph upgrade while your cluster is in warning or error state. A process upgrade must be done from an clean cluster.

Don't stay with a replicate at 2. Majority of problems come from that point: just look the advices given by experience users of the list. You should set a replicate of 3 and a min_size at 2. This will prevent you to fail some data because of a double fault which is frequent.

For your specific problem, i have no idea of the root cause. If you have already checked your network (tuning parameters, enable jumbo, etc..), your software version on all the components, your hardware (raid card, system messages, ...), may be you should just re-install your first OSD server. I had a big problem after an upgrade from hammer to jewel and nobody seems to have encountered it doing the same operation. All servers were configured the same way but they had not the same history.We found that the problem came from the differents versions we installed on some OSD servers (giant -> hammer -> jewel). OSD servers which never knew the giant version had no problem at all. We had on the problematic servers (in jewel) some bugs which was corrected years ago in giant !!!. So we have to isolate those servers and reinstall them directly in jewel : it solved the problem.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux