On Mon, 5 Feb 2018, Gregory Farnum wrote: > On Mon, Feb 5, 2018 at 3:23 AM Caspar Smit <casparsmit@xxxxxxxxxxx> wrote: > > > Hi Gregory, > > > > Thanks for your answer. > > > > I had to add another step emit to your suggestion to make it work: > > > > step take default > > step chooseleaf indep 4 type host > > step emit > > step take default > > step chooseleaf indep 4 type host > > step emit > > > > However, now the same OSD is chosen twice for every PG: > > > > # crushtool --test -i compiled-crushmap-new --rule 1 --show-mappings --x 1 > > --num-rep 8 > > CRUSH rule 1 x 1 [5,9,3,12,5,9,3,12] > > > > Oh, that must be because it has the exact same inputs on every run. > Hrmmm...Sage, is there a way to seed them differently? Or do you have any > other ideas? :/ Nope. The CRUSH rule isn't meant to work like that.. > > I'm wondering why something like this won't work (crushtool test ends up > > empty): > > > > step take default > > step chooseleaf indep 4 type host Yeah, s/chooseleaf/choose/ and it should work! s > > step choose indep 2 type osd > > step emit > > > > Chooseleaf is telling crush to go all the way down to individual OSDs. I’m > not quite sure what happens when you then tell it to pick OSDs again but > obviously it’s failing (as the instruction is nonsense) and emitting an > empty list. > > > > > > > # crushtool --test -i compiled-crushmap-new --rule 1 --show-mappings --x 1 > > --num-rep 8 > > CRUSH rule 1 x 1 [] > > > > Kind regards, > > Caspar Smit > > > > 2018-02-02 19:09 GMT+01:00 Gregory Farnum <gfarnum@xxxxxxxxxx>: > > > >> On Fri, Feb 2, 2018 at 8:13 AM, Caspar Smit <casparsmit@xxxxxxxxxxx> > >> wrote: > >> > Hi all, > >> > > >> > I'd like to setup a small cluster (5 nodes) using erasure coding. I > >> would > >> > like to use k=5 and m=3. > >> > Normally you would need a minimum of 8 nodes (preferably 9 or more) for > >> > this. > >> > > >> > Then i found this blog: > >> > https://ceph.com/planet/erasure-code-on-small-clusters/ > >> > > >> > This sounded ideal to me so i started building a test setup using the > >> 5+3 > >> > profile > >> > > >> > Changed the erasure ruleset to: > >> > > >> > rule erasure_ruleset { > >> > ruleset X > >> > type erasure > >> > min_size 8 > >> > max_size 8 > >> > step take default > >> > step choose indep 4 type host > >> > step choose indep 2 type osd > >> > step emit > >> > } > >> > > >> > Created a pool and now every PG has 8 shards in 4 hosts with 2 shards > >> each, > >> > perfect. > >> > > >> > But then i tested a node failure, no problem again, all PG's stay active > >> > (most undersized+degraded, but still active). Then after 10 minutes the > >> > OSD's on the failed node were all marked as out, as expected. > >> > > >> > I waited for the data to be recovered to the other (fifth) node but that > >> > doesn't happen, there is no recovery whatsoever. > >> > > >> > Only when i completely remove the down+out OSD's from the cluster the > >> data > >> > is recovered. > >> > > >> > My guess is that the "step choose indep 4 type host" chooses 4 hosts > >> > beforehand to store data on. > >> > >> Hmm, basically, yes. The basic process is: > >> > >> > step take default > >> > >> take the default root. > >> > >> > step choose indep 4 type host > >> > >> Choose four hosts that exist under the root. *Note that at this layer, > >> it has no idea what OSDs exist under the hosts.* > >> > >> > step choose indep 2 type osd > >> > >> Within the host chosen above, choose two OSDs. > >> > >> > >> Marking out an OSD does not change the weight of its host, because > >> that causes massive data movement across the whole cluster on a single > >> disk failure. The "chooseleaf" commands deal with this (because if > >> they fail to pick an OSD within the host, they will back out and go > >> for a different host), but that doesn't work when you're doing > >> independent "choose" steps. > >> > >> I don't remember the implementation details well enough to be sure, > >> but you *might* be able to do something like > >> > >> step take default > >> step chooseleaf indep 4 type host > >> step take default > >> step chooseleaf indep 4 type host > >> step emit > >> > >> And that will make sure you get at least 4 OSDs involved? > >> -Greg > >> > >> > > >> > Would it be possible to do something like this: > >> > > >> > Create a 5+3 EC profile, every hosts has a maximum of 2 shards (so 4 > >> hosts > >> > are needed), in case of node failure -> recover data from failed node to > >> > fifth node. > >> > > >> > Thank you in advance, > >> > Caspar > >> > > >> > > >> > > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@xxxxxxxxxxxxxx > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > >> > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com