Hi I think the issue you are experiencing may be related to a bug that has been reported in the Ceph project. Specifically, the issue is documented in https://tracker.ceph.com/issues/58156, and a pull request has been submitted and merged in https://github.com/ceph/ceph/pull/44090. On Fri, Apr 14, 2023 at 8:17 PM Alexandre Becholey <alex@xxxxxxxxxxx> wrote: > Dear Ceph Users, > > I have a small ceph cluster for VMs on my local machine. It used to be > installed with the system packages and I migrated it to docker following > the documentation. It worked OK until I migrated from v16 to v17 a few > months ago. Now the OSDs remain "not in" as shown in the status: > > # ceph -s > cluster: > id: abef2e91-cd07-4359-b457-f0f8dc753dfa > health: HEALTH_WARN > 6 stray daemon(s) not managed by cephadm > 1 stray host(s) with 6 daemon(s) not managed by cephadm > 2 devices (4 osds) down > 4 osds down > 1 host (4 osds) down > 1 root (4 osds) down > Reduced data availability: 129 pgs inactive > > services: > mon: 1 daemons, quorum bjorn (age 8m) > mgr: bjorn(active, since 8m) > osd: 4 osds: 0 up (since 4w), 4 in (since 4w) > > data: > pools: 2 pools, 129 pgs > objects: 0 objects, 0 B > usage: 1.8 TiB used, 1.8 TiB / 3.6 TiB avail > pgs: 100.000% pgs unknown > 129 unknown > > I can see some network communication between the OSDs and the monitor and > the OSDs are running: > > # docker ps -a > CONTAINER ID IMAGE COMMAND CREATED > STATUS PORTS NAMES > f8fbe8177a63 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes > ago Up 9 minutes > ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-2 > 6768ec871404 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes > ago Up 9 minutes > ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-1 > ff82f84504d5 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes > ago Up 9 minutes > ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-0 > 4c89e50ce974 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes > ago Up 9 minutes > ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-3 > fe0b6089edda quay.io/ceph/ceph:v17 "/usr/bin/ceph-mon -…" 9 minutes > ago Up 9 minutes > ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-mon-bjorn > f76ac9dcdd6d quay.io/ceph/ceph:v17 "/usr/bin/ceph-mgr -…" 9 minutes > ago Up 9 minutes > ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-mgr-bjorn > > However when I try to use any `ceph orch` commands, they hang. I can also > see some blacklist on the OSDs: > > # ceph osd blocklist ls > 10.99.0.13:6833/3770763474 2023-04-13T08:17:38.885128+0000 > 10.99.0.13:6832/3770763474 2023-04-13T08:17:38.885128+0000 > 10.99.0.13:0/2634718754 2023-04-13T08:17:38.885128+0000 > 10.99.0.13:0/1103315748 2023-04-13T08:17:38.885128+0000 > listed 4 entries > > The first two entries correspond to the manager process. `ceph osd > blocked-by` does not show anything. > > I think I might have forgotten to set the `ceph osd require-osd-release > ...` because 14 is written in > `/var/lib/ceph/<ID>/osd.?/require_osd_release`. If I try to do it now, the > monitor hits an abort: > > debug 0> 2023-04-12T08:43:27.788+0000 7f0fcf2aa700 -1 *** Caught > signal (Aborted) ** > in thread 7f0fcf2aa700 thread_name:ms_dispatch > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > (stable) > 1: /lib64/libpthread.so.0(+0x12cf0) [0x7f0fd94bbcf0] > 2: gsignal() > 3: abort() > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x18f) [0x7f0fdb5124e3] > 5: /usr/lib64/ceph/libceph-common.so.2(+0x26a64f) [0x7f0fdb51264f] > 6: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, > std::map<std::__cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> >, boost::variant<std::__cxx11::basi > 7: > (OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>)+0x38d) > [0x562719cb127d] > 8: (OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x17b) > [0x562719cb18cb] > 9: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x2ce) > [0x562719c20ade] > 10: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x1ebb) > [0x562719ab9f6b] > 11: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x9f2) > [0x562719abe152] > 12: (Monitor::_ms_dispatch(Message*)+0x406) [0x562719abf066] > 13: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5d) > [0x562719aef13d] > 14: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> > const&)+0x478) [0x7f0fdb78e0e8] > 15: (DispatchQueue::entry()+0x50f) [0x7f0fdb78b52f] > 16: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f0fdb8543b1] > 17: /lib64/libpthread.so.0(+0x81ca) [0x7f0fd94b11ca] > 18: clone() > > Any ideas on what is going on? > > Many thanks, > Alexandre > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx