Re: CRUSH depends on host + OSD?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday, October 21, 2014, Chad Seys <cwseys@xxxxxxxxxxxxxxxx> wrote:
Hi Craig,

> It's part of the way the CRUSH hashing works.  Any change to the CRUSH map
> causes the algorithm to change slightly.

Dan@cern could not replicate my observations, so I plan to follow his
procedure (fake create an OSD, wait for rebalance, remove fake OSD) in the
near future to see if I can replicate his! :)


> BTW, it's safer to remove OSDs and hosts by first marking the OSDs UP and
> OUT (ceph osd out OSDID).  That will trigger the remapping, while keeping
> the OSDs in the pool so you have all of your replicas.

I am under the impression that the procedure I posted does leave the OSDs in
the pool while an additional replication takes place: After "ceph osd crush
remove osd.osdnum" I see that the used % on the removed OSD slowly decreases
as the relocation of blocks takes place.

If my ceph-fu were strong enough I would try to find some block replicated
num_replicas+1 times so that my belief would be well-founded. :)

Also "ceph osd crush remove osd.osdnum" still shows the OSD in "ceph osd
tree", but it is not attached to any server.  I think it might even be marked
UP and DOWN, but I cannot confirm.

So I believe so far the approaches are equivalent.

BUT, I think that to keep an OSD out after using "ceph osd out OSDID" one
needs to turn off "auto in" or something.

I don't want to turn that off b/c in the past I had some slow drives which
would occasionally be marked "out".  If they stayed "out" that could increase
load on other drives, making them unresponsive, getting them marked "out" as
well, leading to a domino effect where too many drives get marked "out" and
the cluster goes down.

Now I have better hardware, but since the scenario exists, I'd rather avoid
it! :)

There are separate options for automatically marking new drives in versus marking in established ones. Should be in the docs! :)
-Greg

 


> If you mark the OSDs OUT, wait for the remapping to finish, and remove the
> OSDs and host from the CRUSH map, there will still be some data migration.

Yep, this is what I see.  But I find it weird.

>
>
> Ceph is also really good at handling multiple changes in a row.  For
> example, I had to reformat all of my OSDs because I chose my mkfs.xfs
> parameters poorly.   I removed the OSDS, without draining them first, which
> caused a lot of remapping.  I then quickly formatted the OSDs, and put them
> back in.  The CRUSH map went back to what it started with, and the only
> remapping required was to re-populate the newly formatted OSDs.

In this case you'd be living with num_replicas-1 for a while.  Sounds
exciting!  :)

Thanks,
Chad.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux