Re: erasure-code-lrc Questions regarding repair

Eugen Block <eblock@xxxxxx> · Tue, 16 Jan 2024 09:05:11 +0000

Hi,

I don't really have an answer, I just wanted to mention that I created  
a tracker issue [1] because I believe there's a bug in the LRC plugin.  
But there hasn't been any response yet.

[1] https://tracker.ceph.com/issues/61861

Zitat von Ansgar Jazdzewski <a.jazdzewski@xxxxxxxxxxxxxx>:

hi folks,

I currently test erasure-code-lrc (1) in a multi-room multi-rack setup.
The idea is to be able to repair a disk-failures within the rack
itself to lower bandwidth-usage

```bash
ceph osd erasure-code-profile set lrc_hdd \
plugin=lrc \
crush-root=default \
crush-locality=rack \
crush-failure-domain=host \
crush-device-class=hdd \
mapping=__DDDDD__DDDDD__DDDDD__DDDDD \
layers='
[
[ "_cDDDDD_cDDDDD_cDDDDD_cDDDDD", "" ],
[ "cDDDDDD_____________________", "" ],
[ "_______cDDDDDD______________", "" ],
[ "______________cDDDDDD_______", "" ],
[ "_____________________cDDDDDD", "" ],
]' \
crush-steps='[
[ "choose", "room", 4 ],
[ "choose", "rack", 1 ],
[ "chooseleaf", "host", 7 ],
]'
```

The roule picks 4 out of 5 rooms and keeps the PG in one rack like expected!

However it looks like the PG will not move to another Room if the PG
is undersized or the entire Room or Rack is down!

Questions:
* do I miss something to allow LRC (PG's) to move across Racks/Rooms  
for repair?
* Is it even possible to build such a 'Multi-stage' grushmap?

Thanks for your help,
Ansgar

1) https://docs.ceph.com/en/quincy/rados/operations/erasure-code-jerasure/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx