erasure-code-lrc Questions regarding repair

Ansgar Jazdzewski <a.jazdzewski@xxxxxxxxxxxxxx> · Mon, 15 Jan 2024 10:56:26 +0100

hi folks,

I currently test erasure-code-lrc (1) in a multi-room multi-rack setup.
The idea is to be able to repair a disk-failures within the rack
itself to lower bandwidth-usage

```bash
ceph osd erasure-code-profile set lrc_hdd \
plugin=lrc \
crush-root=default \
crush-locality=rack \
crush-failure-domain=host \
crush-device-class=hdd \
mapping=__DDDDD__DDDDD__DDDDD__DDDDD \
layers='
[
[ "_cDDDDD_cDDDDD_cDDDDD_cDDDDD", "" ],
[ "cDDDDDD_____________________", "" ],
[ "_______cDDDDDD______________", "" ],
[ "______________cDDDDDD_______", "" ],
[ "_____________________cDDDDDD", "" ],
]' \
crush-steps='[
[ "choose", "room", 4 ],
[ "choose", "rack", 1 ],
[ "chooseleaf", "host", 7 ],
]'
```

The roule picks 4 out of 5 rooms and keeps the PG in one rack like expected!

However it looks like the PG will not move to another Room if the PG
is undersized or the entire Room or Rack is down!

Questions:
* do I miss something to allow LRC (PG's) to move across Racks/Rooms for repair?
* Is it even possible to build such a 'Multi-stage' grushmap?

Thanks for your help,
Ansgar

1) https://docs.ceph.com/en/quincy/rados/operations/erasure-code-jerasure/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx