RBD to fail fast/auto unmap in case of timeout

Mathias Chapelain <mathias.chapelain@xxxxxxxxx> · Fri, 20 Jan 2023 11:10:29 +0000

Hello,

We would like to run a RAID1 between a local storage and a RBD device. This would allow us to sustain network failures or Ceph failures and also give better read performance as we would set it up with write-mostly on RBD in mdadm.

Basically we would like to implement https://discord.com/blog/how-discord-supercharges-network-disks-for-extreme-low-latency.

RAID1 is working well but if there is timeouts, the RBD volume won't fail and mdadm will not catch the broken device. Also the writes then hangs waiting for the network/RBD to come back. If we force unmap the RBD device then it fails as expected and writes can continue on other RAID1 device.

We tried setting the `osd_request_timeout` to a small value (3 or 2 seconds) but it only gives us timeout in kernel logs:

```
libceph: tid 25792 on osd39 timeout
rbd: rbd0: write at objno 602 0~512 result -110
rbd: rbd0: write result -110
print_req_error: 15 callbacks suppressed
blk_update_request: timeout error, dev rbd0, sector 4931584 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
libceph: tid 25794 on osd39 timeout
rbd: rbd0: write at objno 602 512~512 result -110
rbd: rbd0: write result -110
blk_update_request: timeout error, dev rbd0, sector 4931585 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
```

Is there something that we missed or is it currently impossible with kRBD to kind of "fail fast" in case of timeout and unmap/remove associated RBD devices? Or is there another client that can do what we want (ceph-nbd or with librbd)?

We found this rook issue that is not really helpful but give insight https://github.com/rook/rook/issues/376.

Thanks!

--
Mathias Chapelain
Storage Engineer
Proton AG
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx