Re: PG inconsistency

Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> · Mon, 10 Nov 2014 11:29:36 -0800

For #1, it depends what you mean by fast.  I wouldn't worry about it taking 15 minutes.
If you mark the old OSD out, ceph will start remapping data immediately, including a bunch of PGs on unrelated OSDs.  Once you replace the disk, and put the same OSDID back in the same host, the CRUSH map will be back to what it was before you started.  All of those remaps on unrelated OSDs will reverse.  They'll complete fairly quickly, because they only have to backfill the data that was written during the remap.

I prefer #1.  ceph pg repair will just overwrite the replicas with whatever the primary OSD has, which may copy bad data from your bad OSD over good replicas.  So #2 has the potential to corrupt the data.  #1 will delete the data you know is bad, leaving only good data behind to replicate.  Once ceph pg repair gets more intelligent, I'll revisit this.

I also prefer the simplicity.  If it's dead or corrupt, they're treated the same.

On Sun, Nov 9, 2014 at 7:25 PM, GuangYang <yguang11@xxxxxxxxxxx> wrote:

In terms of disk replacement, to avoid migrating data back and forth, are the below two approaches reasonable?

 1. Keep the OSD in and do an ad-hoc disk replacement and provision a new OSD (so that keep the OSD id as the same), and then trigger data migration. In this way the data migration only happens once, however, it does require operators to replace the disk very fast.

 2. Move the data on the broken disk to a new disk completely and use Ceph to repair bad objects.

Thanks,

Guang

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com