Re: [EXTERNAL] Natuilus: Taking out OSDs that are 'Failure Pending'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Marking them OUT first is the way to go.  As long as the osds stay UP, they can and will participate in the recovery.  How many you can mark out at one time will depend on how sensitive your client i/o is to background recovery, and all of the related tunings.  If you have the hours/days to spare, it is definitely easier on the cluster to do them one at a time.

Thank you,
Josh Beaman

From: Dave Hall <kdhall@xxxxxxxxxxxxxx>
Date: Friday, August 4, 2023 at 8:45 AM
To: ceph-users <ceph-users@xxxxxxx>
Cc: anthony.datri <anthony.datri@xxxxxxxxx>
Subject: [EXTERNAL]  Natuilus: Taking out OSDs that are 'Failure Pending'
Hello.  It's been a while.  I have a Nautilus cluster with 72 x 12GB HDD
OSDs (BlueStore) and mostly of EC 8+2 pools/PGs.  It's been working great -
some nodes went nearly 900 days without a reboot.

As of yesterday I found that I have 3 OSDs with a Smart status of 'Pending
Failure'.  New drives are ordered and will be here next week.  There is a
procedure in the documentation for replacing an OSD, but I can't do that
directly until I receive the drives.

My inclination is to mark these 3 OSDs 'OUT' before they crash completely,
but I want to confirm my understanding of Ceph's response to this.  Mainly,
given my EC pools (or replicated pools for that matter), if I mark all 3
OSD out all at once will I risk data loss?

If I have it right, marking an OSD out will simply cause Ceph to move all
of the PG shards from that OSD to other OSDs, so no major risk of data
loss.  However, if it would be better to do them one per day or something,
I'd rather be safe.

I also assume that I should wait for the rebalance to complete before I
initiate the replacement procedure.

Your thoughts?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux