Re: erasure code profile

Eric Goirand <egoirand@xxxxxxxxxx> · Sat, 23 Sep 2017 12:13:03 +0200

    Hello Luis,
    To find what EC profile would be best in your environment, you
      would need to know :
    - how mans disks or hosts failure would you accept : I understood
      from your email that you want to be able to loose one room, but
      won't you need a bit more, such as loosing 1 disk (or 1 host) in
      another room while the first one is down ?  

    - how many OSD nodes you can (or will) have per room or will you
      adapt this number from the EC profile you set up ?
    These two questions answered, you will be able to set up the m
      parameter of the EC profile and you would then need to compute the
      k parameter so that you have the same number ((k+m) / 3) necessary
      OSD nodes per room at minimum.
    In each situation, you would then certainly need to adapt the
      CRUSH ruleset associated with the EC profile to have exactly
      ((k+m) / 3) x EC chunks per room to be able to have access to all
      your data when one room is down.

    Suppose that we only accept one room down and nothing more :

      - if m=1, k will be mandatory equal to 2 as you arrived to it,
        and you would only have 1 OSD node per room.

      - if m=2, k will be equal to 4 If I apply it and you would need
        2 OSD nodes per room and you would need to change EC 4+2 ruleset
        to have 2 chunks per room.

    Suppose now that you want to have more possible downtime, for
      example you want to be able to perform the maintenance of one OSD
      node when one room is down, then you would need to have at least m
      = (number of OSD node in 1 room) + 1.

      - if I have 2 OSD nodes per room, m will need to be equal to 3
        and by deduction k would be equal to 3 and I would need exactly
        2 ((3+3) / 3) ruleset chunks per room.
      - if I have 3 OSD nodes per room, then m=4 and k=5 and you
        would need 3 chunks per room.

    Now, this is a minimum and for a given EC profile (let's say 5+4) I
    would recommend to have one spare OSD node per room so that you
    could perform backfilling inside one room in case another OSD is
    down.

    Thus, if you can have 12 OSD nodes in total, 4 OSD nodes per room, I
    would still be using profile EC 5+4 and changing the ruleset to have
    exactly 3 chunks per room, the efficiency of your cluster will be
    55% (55 TiB per 100 TiB of raw capacity).

    Also remember that you would still need a good network between rooms
    (both for speed and latency) and powerful CPUs on OSD nodes to
    compute the EC chunks all the time.

    Best Regards,

    Eric.

    On 09/22/2017 10:39 AM, Luis Periquito
      wrote:

      Hi all,

I've been trying to think what will be the best erasure code profile,
but I don't really like the one I came up with...

I have 3 rooms that are part of the same cluster, and I need to design
so we can lose any one of the 3.

As this is a backup cluster I was thinking on doing a k=2 m=1 code,
with ruleset-failure-domain=room as the OSD tree is correctly built.

Can anyone think of a better profile?

thanks,
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com