It's part of the way the CRUSH hashing works. Any change to the CRUSH map causes the algorithm to change slightly.
BTW, it's safer to remove OSDs and hosts by first marking the OSDs UP and OUT (ceph osd out OSDID). That will trigger the remapping, while keeping the OSDs in the pool so you have all of your replicas. Your PGs will still be degraded (there are plans to differentiation the "not enough copies" degraded from the "data not in the correct place" degraded), but you'll always have num_replica copies.
If you mark the OSDs OUT, wait for the remapping to finish, and remove the OSDs and host from the CRUSH map, there will still be some data migration.
Ceph is also really good at handling multiple changes in a row. For example, I had to reformat all of my OSDs because I chose my mkfs.xfs parameters poorly. I removed the OSDS, without draining them first, which caused a lot of remapping. I then quickly formatted the OSDs, and put them back in. The CRUSH map went back to what it started with, and the only remapping required was to re-populate the newly formatted OSDs.
On Wed, Oct 15, 2014 at 9:06 AM, Chad Seys <cwseys@xxxxxxxxxxxxxxxx> wrote:
Hi all,
When I remove all OSDs on a given host, then wait for all objects (PGs?) to
be to be active+clean, then remove the host (ceph osd crush remove hostname),
that causes the objects to shuffle around the cluster again.
Why does the CRUSH map depend on hosts that no longer have OSDs on them?
A wonderment question,
C.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com