Proper procedure to replace DB/WAL SSD

Caspar Smit <casparsmit@xxxxxxxxxxx> · Fri, 23 Feb 2018 14:27:18 +0100

Hi All,

What would be the proper way to preventively replace a DB/WAL SSD (when it is nearing it's DWPD/TBW limit and not failed yet).

It hosts DB partitions for 5 OSD's

Maybe something like:

1) ceph osd reweight 0 the 5 OSD's
2) let backfilling complete
3) destroy/remove the 5 OSD's
4) replace SSD
5) create 5 new OSD's with seperate DB partition on new SSD

When these 5 OSD's are big HDD's (8TB) a LOT of data has to be moved so i thought maybe the following would work:

1) ceph osd set noout
2) stop the 5 OSD's (systemctl stop)
3) 'dd' the old SSD to a new SSD of same or bigger size
4) remove the old SSD
5) start the 5 OSD's (systemctl start)
6) let backfilling/recovery complete (only delta data between OSD stop and now)
6) ceph osd unset noout

Would this be a viable method to replace a DB SSD? Any udev/serial nr/uuid stuff preventing this to work?

Or is there another 'less hacky' way to replace a DB SSD without moving too much data?

Kind regards,
Caspar

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com