Re: osd crush reweight 0 on "out" OSD causes backfilling?

Christian Sarrasin <c.nntp@xxxxxxxxxxxxxxxxxx> · Tue, 13 Feb 2018 20:45:58 +0100

Thanks!  I'm still puzzled as to _what_ data is moving if the OSD was
previously "out" and didn't host any PG (according to pg dump).  The
host only had one other OSD which was already "out" and had zero weight.
 It looks like Ceph is moving some other data, which wasn't hosted on
the re-weighted OSD.

Just to reiterate my q: from what I'm reading here, it sounds like the
best practice to remove an OSD from the cluster is to run:

1. ceph osd crush reweight osd.X 0
2. ceph osd out osd.X
<further steps omitted; refer to [1]>

The official doc [1] suggests doing just #2.

[1]
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

On 13/02/18 14:38, David Turner wrote:
> An out osd still has a crush weight. Removing that osd or weighting it
> to 0 will change the weight of the host that it's in. That is why data
> moves again. There is a thread in the ML started by Sage about possible
> ways to confront the double data shift when drives fail. Data moving of
> when it goes out and then again when it is removed from the cluster.
> 
> If the drive was still readable when it was marked out, the best method
> is to weight it to 0 while it is still running so it can be used to
> offload its data. Also in this method, when you remove it from the
> cluster, there will not be any additional data movement.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com