Re: noout equivalent for temporary OSD rm?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 8 Feb 2017, Henrik Korkuc wrote:
> On 17-02-08 16:23, Sage Weil wrote:
> > On Wed, 8 Feb 2017, John Spray wrote:
> > > So I've just finished upgrading my home cluster OSDs to bluestore by
> > > killing them one by one and then letting backfill happen to "new" OSDs
> > > on the same drives.  Hooray!
> > > 
> > > One slightly awkward thing I ran into was that even though I had noout
> > > set throughout, during the period between removing the old OSD and
> > > adding the "new" one, some PGs would of course get remapped (and start
> > > generating backfill IO to third party OSDs).  This does make sense
> > > when you think about it (noout doesn't make the cluster magically
> > > remember OSDs that have been removed), but is still an undesirable
> > > behaviour.
> > > 
> > > A) Do we currently have a mechanism to tell the cluster "even though I
> > > removed this OSD, don't go moving PGs around just yet"?  Should we add
> > > one?
> > There's 'ceph osd set norebalance'...
> > 
> > > B) Was there a way for me to avoid this by e.g. skipping the "osd rm
> > > X" and "osd crush rm osd.X" that I'm currently doing before adding the
> > > new OSD that will take the old OSD's ID?
> > This keeps coming up but I don't think we've ever proposed a good
> > solution.  Perhaps the simplest thing is to allow ceph-disk to take an OSD
> > id as an argument.  Normally this is probably a no-no since the OSD might
> > exist elsewhere and have real data on it, and you don't want multiple OSDs
> > with the same id, but we could make this safer by requiring that the
> > OSD be marked 'lost' before it's id can be reused...
> what about "replace"? It could remove old osd from the crush and add new one
> in same location under new id, setting all params to be the same (weight,
> reweight, affinity)?

The problem is that one of the inputs to CRUSH's pseudorandom placement 
decision is the OSD id.  If the id changes, even if the OSD in the same 
position in the hierarchy, that OSD will get a different pseudorandom 
subset of the PGs.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux