Re: disk controller failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13.12.2018 18:19, Alex Gorbachev wrote:
On Thu, Dec 13, 2018 at 10:48 AM Dietmar Rieder
<dietmar.rieder@xxxxxxxxxxx> wrote:
Hi Cephers,

one of our OSD nodes is experiencing a Disk controller problem/failure
(frequent resetting), so the OSDs on this controller are flapping
(up/down in/out).

I will hopefully get the replacement part soon.

I have some simple questions, what are the best steps to take now before
an after replacement of the controller?

- marking down and shutting down all osds on that node?
- waiting for rebalance is finished
- replace the controller
- just restart the osds? Or redeploy them, since they still hold data?

We are running:

ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous
(stable)
CentOS 7.5

Sorry for my naive questions.
I usually do ceph osd set noout first to prevent any recoveries

Then replace the hardware and make sure all OSDs come back online

Then ceph osd unset noout

Best regards,
Alex


Setting noout prevents the osd's from re-balancing.  ie when you do a short fix and do not want it to start re-balancing, since you know the data will be available shortly.. eg a reboot or similar.

if osd's are flapping you normally want them out of the cluster, so they do not impact performance any more.


kind regards

Ronny Aasen


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux