Hello,
Before we start I'm fully aware that this kind of setup is not
recommended by any means and I'm familiar with it's implications. I'm
just trying to practice extreme situations, just in case...
I have a test cluster with:
3 nodes with Proxmox 7.3 + Ceph Quincy 17.2.5
3 monitors + 3 managers in server01, server02 and server03
4 OSD, two in server01, two in server02. No OSD in server03. All OSD are
class "ssd".
1 pool with replica=2, min_replica=1. Crush rule uses just ssd class OSD.
I do wait for ceph status to be fully OK between each test.
A.- If I orderly shutdown server01, it's OSDs get marked down as
expected. I/O on the pool works correctly before, during and after the
shutdown.
B.- If I poweroff server01, it's OSDs do not get marked down. I/O on the
pool does not work at all, neither reads nor writes. A small number of
slow-ops show in ceph status, something like 7 to 25. After 30 minutes,
the server01's OSDs get marked down, I/O on the pool gets restored and
slow-ops disappear.
C.- Now I create an OSD on server03 with class "noClass". This OSD won't
be used by the pool. If I now poweroff server01, it's OSDs get marked
down as soon as some I/O is sent to the pool and I/O works correctly.
Looks like I am in this exact situation:
https://tracker.ceph.com/issues/16910#note-2
Questions:
Why does Ceph behave this way in test B? Shouldn't it simply mark the
OSDs down like in test A and C?
Which config setting(s) set that 30 minute wait time before marking all
OSD down?
Many thanks in advance!
--
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx