Re: new OSD re-using old OSD id fails to boot

Sage Weil <sweil@xxxxxxxxxx> · Wed, 9 Dec 2015 06:00:06 -0800 (PST)

On Wed, 9 Dec 2015, Wei-Chung Cheng wrote:
> Hi Loic,
> 
> I try to reproduce this problem on my CentOS7.
> I can not do the same issue.
> This is my version:
> ceph version 10.0.0-928-g8eb0ed1 (8eb0ed1dcda9ee6180a06ee6a4415b112090c534)
> Would you describe more detail?
> 
> 
> Hi David, Sage,
> 
> In most of time, when we found the osd failure, the OSD is already in
> `out` state.
> It could not avoid the redundant data movement unless we could set the
> osd noout when failure.
> Is it right? (Means if OSD go into `out` state, it will make some
> redundant data movement)
> 
> Could we try the traditional spare behavior? (Set some disks backup
> and auto replace the broken device?)
> 
> That can replace the failure osd before it go into the `out` state.
> Or we could always set the osd noout?

I don't think there is a problem with 'out' if the osd id is reused and 
the crush position remains the same.  And I expect usually the OSD will be 
replaced by a disk with a similar size.  If the replacement is smaller (or 
0--removed entirely) then you get double-movement, but if it's the same or 
larger I think it's fine.

The sequence would be something like

 up + in
 down + in 
 5-10 minutes go by
 down + out    (marked out by monitor)
  new replicas uniformly distributed across cluster
 days go by
 disk removed
 new disk inserted
 ceph-disk recreate ... recreates osd dir w/ the same id, new uuid
 on startup, osd adjusts crush weight (maybe.. usually by a smallish amount) 
 up + in
  replicas migrate back to new device

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html