Hello, The way Wido explained is the correct way. I won't deny, however, last year we had problems with our SSD disks and they did not perform well. So we decided to replace all disks. As the replacement done by Ceph caused highload/downtime on the clients (which was the reason we wanted to replace the disks), we did this the rsync way. We did not encounter any problem with that. It is very important to flush the journal before syncing and correct the journal symlinks before starting the new disk. Also make sure you disarm the old disk, as it has the same disk ID, you will run in a lot of problems if you reenable that disk by accident. So yes, it is possible, it is very dangerous, and it is not recommended, Attached the script we used to assist with the migration. (we were on hammer back then) I'm not sure it is the latest version we have, it formats a disk with ceph-disk prepare command, mount it the 'ceph' way, and then print a series of commands to manually execute. And again a big warning, use at your own risk. regards, mart On 12/16/2016 09:46 PM, Brian :: wrote: > The fact that you are all SSD I would do exactly what Wido said - > gracefully remove the OSD and gracefully bring up the OSD on the new > SSD. > > Let Ceph do what its designed to do. The rsync idea looks great on > paper - not sure what issues you will run into in practise. > > > On Fri, Dec 16, 2016 at 12:38 PM, Alessandro Brega > <alessandro.brega1@xxxxxxxxx> wrote: >> 2016-12-16 10:19 GMT+01:00 Wido den Hollander <wido@xxxxxxxx>: >>> >>>> Op 16 december 2016 om 9:49 schreef Alessandro Brega >>>> <alessandro.brega1@xxxxxxxxx>: >>>> >>>> >>>> 2016-12-16 9:33 GMT+01:00 Wido den Hollander <wido@xxxxxxxx>: >>>> >>>>>> Op 16 december 2016 om 9:26 schreef Alessandro Brega < >>>>> alessandro.brega1@xxxxxxxxx>: >>>>>> >>>>>> Hi guys, >>>>>> >>>>>> I'm running a ceph cluster using 0.94.9-1trusty release on XFS for >>>>>> RBD >>>>>> only. I'd like to replace some SSDs because they are close to their >>>>>> TBW. >>>>>> >>>>>> I know I can simply shutdown the OSD, replace the SSD, restart the >>>>>> OSD >>>>> and >>>>>> ceph will take care of the rest. However I don't want to do it this >>>>>> way, >>>>>> because it leaves my cluster for the time of the rebalance/ >>>>>> backfilling >>>>> in >>>>>> a degraded state. >>>>>> >>>>>> I'm thinking about this process: >>>>>> 1. keep old OSD running >>>>>> 2. copy all data from current OSD folder to new OSD folder (using >>>>>> rsync) >>>>>> 3. shutdown old OSD >>>>>> 4. redo step 3 to update to the latest changes >>>>>> 5. restart OSD with new folder >>>>>> >>>>>> Are there any issues with this approach? Do I need any special rsync >>>>> flags >>>>>> (rsync -avPHAX --delete-during)? >>>>>> >>>>> Indeed X for transferring xattrs, but also make sure that the >>>>> partitions >>>>> are GPT with the proper GUIDs. >>>>> >>>>> I would never go for this approach in a running setup. Since it's a >>>>> SSD >>>>> cluster I wouldn't worry about the rebalance and just have Ceph do the >>>>> work >>>>> for you. >>>>> >>>>> >>>> Why not - if it's completely safe. It's much faster (local copy), >>>> doesn't >>>> put load on the network (local copy), much safer (2-3 minutes instead of >>>> 1-2 hours degraded time (2TB SSD)), and it's really simple (2 rsync >>>> commands). Thank you. >>>> >>> I wouldn't say it is completely safe, hence my remark. If you copy, indeed >>> make sure you copy all the xattrs, but also make sure the partitions tables >>> match. >>> >>> That way it should work, but it's not a 100% guarantee. >>> >> Ok, thanks! Can a ceph dev confirm? I do not want to loose any data ;) >> >> Alessandro >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
#!/usr/bin/env python import argparse import os import stat import sys import re from subprocess import call #### WARNING #### THIS IS A VERY DANGEROUS SCRIPT. NO GUARANTEES THIS WILL WORK FO YOU print "Please read and understand the script before executing" sys.exit(1) if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("-d", "--destination", type=str, required=1, help="destination disk") parser.add_argument("-s", "--source", type=str, required=1, help="source ods number") parser.add_argument("--force", help="force migration", action="store_true") force = False parted = False args = parser.parse_args() if args.force: force = True osd = args.source disk_id = args.destination disk = '/dev/' + disk_id # First we gonna check if provided disk is indeed a block device and has an empty # parition table print 'Examining disk: %s' %(disk) # Does the device exists: if not os.path.exists(disk): print 'Abort: disk device not found' sys.exit(1) mode = os.stat(disk).st_mode if not stat.S_ISBLK(mode): print 'Abort: disk device is not a block device' sys.exit(1) if not re.match('^sd[a-z]{1,2}$', disk_id): print 'Abort: disk does is not a full disk. Did you provide a partition or lvm device?' sys.exit(1) disk_check = disk + '1' if os.path.exists(disk_check): if force: parted = True else: print 'Abort: there are already partions on this disk' print ' please zap the partition table is you' print ' want to use this disk' sys.exit(1) # Ok. Disk is fine. # Check destination print 'Examining osd: %s' %(osd) if not re.match('^[0-9]+$', osd): print 'Abort: osd is not a numeric value' sys.exit(1) osd_path = '/var/lib/ceph/osd/ceph-' + osd if not os.path.isdir(osd_path): print 'Abort: path for osd not found' sys.exit(1) if not os.path.isfile(osd_path + '/whoami'): print 'Abort: whoami file not found for osd' sys.exit(1) # Ok. Looks fine tmp_mount = '/mnt/ceph-' + osd if not os.path.exists(tmp_mount): os.mkdir(tmp_mount) if not os.path.isdir(tmp_mount): print 'Abort: failed to make tmp mountpoint: %s' %(tmp_mount) sys.exit(1) # Prepare the disk call(['ceph-disk', 'suppress-activate', disk]) if not parted: call(['ceph-disk', 'prepare', disk]) # Mount the disk print "Mounting disk %s at %s" %(disk, tmp_mount) part = disk + '1' call(['mount', '-o', 'rw,noatime,attr2,inode64,noquota', part, tmp_mount]); print "OK" print " You should start rsync now" # store usages of this disk print " df -h|grep ceph-%s > /opt/df-%s" %(osd, osd) # stop running osd print " stop ceph-osd id=%s" %(osd) # flush the journal print " ceph-osd --flush-journal -i %s" %(osd) # Sync, without overwrtinging the journal symlink & uuid print " rsync -av -HAX --delete --exclude 'fsid' --exclude 'journal' --exclude 'journal_uuid' %s %s" %(osd_path +'/', tmp_mount +'/') # disarm the old disk print " cd %s && mv whoami whoami.old" %(osd_path) print " cd %s && mv active active.old" %(osd_path) # unmount the old & new disk print " cd ~ && umount %s" %(osd_path) print " umount %s" %(tmp_mount) # mount the new disk on the normal path print " mount -o rw,noatime,attr2,inode64,noquota %s %s" %(part, osd_path) # format the journal print " ceph-osd -i %s --mkjournal" %(osd) # start print " start ceph-osd id=%s" %(osd) print " " # 1910 ceph-disk suppress-activate /dev/sdo # 1911 ceph-disk suppress-activate /dev/sdp # 1912 ceph-disk prepare /dev/sdo # 1913 ceph-disk prepare /dev/sdp
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com