Re: Can't create erasure coded pools with k+m greater than hosts?

Salsa <salsa@xxxxxxxxxxxxxx> · Mon, 21 Oct 2019 15:31:43 +0000

Just to clarify my situation, We have 2 datacenters with 3 hosts each, 12 4TB disks each host (2 are RAID with OS installed and the remaining 10 are used for Ceph). Right now I'm trying a single DC installation and intended to migrate to multi site mirroring DC1 to DC2, so if we lose DC1 we can activate DC2 (NOTE: I have no idea how this is setup and have not planned at all; I thought of geting DC1 to work first and later set the mirroring)

I don't think I'll be able to change the setup in any way, so my next question is: Should I go with a replica 3 or would an erasure 2,1 be ok?

There's a very small chance we get 2 extra hosts for each DC in a near future, but we'll probably use all the available storage space in the nearer future.

We're trying to use as much space as possible.

Thanks;

--
Salsa

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
 On Monday, October 21, 2019 2:53 AM, Martin Verges <martin.verges@xxxxxxxx> wrote:

Just don't do such setups for production, It will be a lot of pain, trouble, and cause you problems.

Just take a cheap system, put some of the disks in it and do a way way better deployment than something like 4+2 on 3 hosts. Whatever you do with that cluster (example kernel update, reboot, PSU failure, ...) causes you and all attached clients, especially bad with VMs on that Ceph cluster, to stop any IO or even crash completely.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.verges@xxxxxxxx
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx

Am Sa., 19. Okt. 2019 um 01:51 Uhr schrieb Chris Taylor <ctaylor@xxxxxxxxxx>:
Full disclosure - I have not created an erasure code pool yet!

 I have been wanting to do the same thing that you are attempting and 
 have these links saved. I believe this is what you are looking for.

 This link is for decompiling the CRUSH rules and recompiling:

 https://docs.ceph.com/docs/luminous/rados/operations/crush-map-edits/

 This link is for creating the EC rules for 4+2 with only 3 hosts:

 https://ceph.io/planet/erasure-code-on-small-clusters/

 I hope that helps!

 Chris

 On 2019-10-18 2:55 pm, Salsa wrote:
 > Ok, I'm lost here.
 > 
 > How am I supposed to write a crush rule?
 > 
 > So far I managed to run:
 > 
 > #ceph osd crush rule dump test -o test.txt
 > 
 > So I can edit the rule. Now I have two problems:
 > 
 > 1. Whats the functions and operations to use here? Is there
 > documentation anywhere abuot this?
 > 2. How may I create a crush rule using this file? 'ceph osd crush rule
 > create ... -i test.txt' does not work.
 > 
 > Am I taking the wrong approach here?
 > 
 > 
 > --
 > Salsa
 > 
 > Sent with ProtonMail Secure Email.
 > 
 > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
 > On Friday, October 18, 2019 3:56 PM, Paul Emmerich
 > <paul.emmerich@xxxxxxxx> wrote:
 > 
 >> Default failure domain in Ceph is "host" (see ec profile), i.e., you
 >> need at least k+m hosts (but at least k+m+1 is better for production
 >> setups).
 >> You can change that to OSD, but that's not a good idea for a
 >> production setup for obvious reasons. It's slightly better to write a
 >> crush rule that explicitly picks two disks on 3 different hosts
 >> 
 >> Paul
 >> 
 >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 >> 
 >> Paul Emmerich
 >> 
 >> Looking for help with your Ceph cluster? Contact us at 
 >> https://croit.io
 >> 
 >> croit GmbH
 >> Freseniusstr. 31h
 >> 81247 München
 >> www.croit.io
 >> Tel: +49 89 1896585 90
 >> 
 >> On Fri, Oct 18, 2019 at 8:45 PM Salsa salsa@xxxxxxxxxxxxxx wrote:
 >> 
 >> > I have probably misunterstood how to create erasure coded pools so I may be in need of some theory and appreciate if you can point me to documentation that may clarify my doubts.
 >> > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host).
 >> > I tried to create an erasure code profile like so:
 >> > "
 >> >
 >> > ceph osd erasure-code-profile get ec4x2rs
 >> >
 >> > ==========================================
 >> >
 >> > crush-device-class=
 >> > crush-failure-domain=host
 >> > crush-root=default
 >> > jerasure-per-chunk-alignment=false
 >> > k=4
 >> > m=2
 >> > plugin=jerasure
 >> > technique=reed_sol_van
 >> > w=8
 >> > "
 >> > If I create a pool using this profile or any profile where K+M > hosts , then the pool gets stuck.
 >> > "
 >> >
 >> > ceph -s
 >> >
 >> > ========
 >> >
 >> > cluster:
 >> > id: eb4aea44-0c63-4202-b826-e16ea60ed54d
 >> > health: HEALTH_WARN
 >> > Reduced data availability: 16 pgs inactive, 16 pgs incomplete
 >> > 2 pools have too many placement groups
 >> > too few PGs per OSD (4 < min 30)
 >> > services:
 >> > mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d)
 >> > mgr: ceph01(active, since 74m), standbys: ceph03, ceph02
 >> > osd: 30 osds: 30 up (since 2w), 30 in (since 2w)
 >> > data:
 >> > pools: 11 pools, 32 pgs
 >> > objects: 0 objects, 0 B
 >> > usage: 32 GiB used, 109 TiB / 109 TiB avail
 >> > pgs: 50.000% pgs not active
 >> > 16 active+clean
 >> > 16 creating+incomplete
 >> >
 >> > ceph osd pool ls
 >> >
 >> > =================
 >> >
 >> > test_ec
 >> > test_ec2
 >> > "
 >> > The pool will never leave this "creating+incomplete" state.
 >> > The pools were created like this:
 >> > "
 >> >
 >> > ceph osd pool create test_ec2 16 16 erasure ec4x2rs
 >> >
 >> > ====================================================
 >> >
 >> > ceph osd pool create test_ec 16 16 erasure
 >> >
 >> > ===========================================
 >> >
 >> > "
 >> > The default profile pool is created correctly.
 >> > My profiles are like this:
 >> > "
 >> >
 >> > ceph osd erasure-code-profile get default
 >> >
 >> > ==========================================
 >> >
 >> > k=2
 >> > m=1
 >> > plugin=jerasure
 >> > technique=reed_sol_van
 >> >
 >> > ceph osd erasure-code-profile get ec4x2rs
 >> >
 >> > ==========================================
 >> >
 >> > crush-device-class=
 >> > crush-failure-domain=host
 >> > crush-root=default
 >> > jerasure-per-chunk-alignment=false
 >> > k=4
 >> > m=2
 >> > plugin=jerasure
 >> > technique=reed_sol_van
 >> > w=8
 >> > "
 >> > From what I've read it seems to be possible to create erasure code pools with higher than hosts K+M. Is this not so?
 >> > What am I doing wrong? Do I have to create any special crush map rule?
 >> > --
 >> > Salsa
 >> > Sent with ProtonMail Secure Email.
 >> >
 >> > ceph-users mailing list
 >> > ceph-users@xxxxxxxxxxxxxx
 >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 > 
 > 
 > _______________________________________________
 > ceph-users mailing list
 > ceph-users@xxxxxxxxxxxxxx
 > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 _______________________________________________
 ceph-users mailing list
 ceph-users@xxxxxxxxxxxxxx
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com