Re: CRUSH odd bucket affinity / persistence

Nick Fisk <nick@xxxxxxxxxx> · Sun, 13 Sep 2015 16:47:38 +0100

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> deeepdish
> Sent: 13 September 2015 02:47
> To: Johannes Formann <mlmail@xxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  CRUSH odd bucket affinity / persistence
> 
> Johannes,
> 
> Thank you — "osd crush update on start = false” did the trick.   I wasn’t aware
> that ceph has automatic placement logic for OSDs
> (http://permalink.gmane.org/gmane.comp.file-
> systems.ceph.user/9035).   This brings up a best practice question..
> 
> How is the configuration of OSD hosts with multiple storage types (e.g.
> spinners + flash/ssd), typically implemented in the field from a crush map /
> device location perspective?   Preference is for a scale out design.

I use something based on this script:

https://gist.github.com/wido/5d26d88366e28e25e23d

With the crush hook location config value in ceph.conf. You can pretty much place OSD's wherever you like with it.

> 
> In addition to the SSDs which are used for a EC cache tier, I’m also planning a
> 5:1 ratio of spinners to SSD for journals.   In this case I want to implement an
> availability groups within the OSD host itself.
> 
> e.g. in a 26-drive chassis, there will be 6 SSDs + 20 spinners.   [2 SSDs for
> replicated cache tier, 4 SSDs will create 5 availability groups of 5 spinners
> each]   The idea is to have CRUSH take into account SSD journal failure
> (affecting 5 spinners).

By default Ceph will make the host the smallest failure domain, so I'm not sure if there is any benefit to identifying to crush that several OSD's share one journal. Whether you lose 1 OSD or all OSD's from a server, there shouldn't be any difference to the possibility of data loss. Or have I misunderstood your question?

> 
> Thanks.
> 
> 
> 
> On Sep 12, 2015, at 19:11 , Johannes Formann <mlmail@xxxxxxxxxx> wrote:
> 
> Hi,
> 
> 
> I’m having a (strange) issue with OSD bucket persistence / affinity on my test
> cluster..
> 
> The cluster is PoC / test, by no means production.   Consists of a single OSD /
> MON host + another MON running on a KVM VM.
> 
> Out of 12 OSDs I’m trying to get osd.10 and osd.11 to be part of the ssd
> bucket in my CRUSH map.   This works fine when either editing the CRUSH
> map by hand (exporting, decompile, edit, compile, import), or via the ceph
> osd crush set command:
> 
> "ceph osd crush set osd.11 0.140 root=ssd”
> 
> I’m able to verify that the OSD / MON host and another MON I have running
> see the same CRUSH map.
> 
> After rebooting OSD / MON host, both osd.10 and osd.11 become part of the
> default bucket.   How can I ensure that ODSs persist in their configured
> buckets?
> 
> I guess you have set "osd crush update on start = true"
> (http://ceph.com/docs/master/rados/operations/crush-map/ ) and only the
> default „root“-entry.
> 
> Either fix the „root“-Entry in the ceph.conf or set osd crush update on start =
> false.
> 
> greetings
> 
> Johannes

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com