Re: OSDs remain not in after update to v17

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I think the issue you are experiencing may be related to a bug that has
been reported in the Ceph project. Specifically, the issue is documented in
https://tracker.ceph.com/issues/58156, and a pull request has been
submitted and merged in https://github.com/ceph/ceph/pull/44090.


On Fri, Apr 14, 2023 at 8:17 PM Alexandre Becholey <alex@xxxxxxxxxxx> wrote:

> Dear Ceph Users,
>
> I have a small ceph cluster for VMs on my local machine. It used to be
> installed with the system packages and I migrated it to docker following
> the documentation. It worked OK until I migrated from v16 to v17 a few
> months ago. Now the OSDs remain "not in" as shown in the status:
>
> # ceph -s
>   cluster:
>     id:     abef2e91-cd07-4359-b457-f0f8dc753dfa
>     health: HEALTH_WARN
>             6 stray daemon(s) not managed by cephadm
>             1 stray host(s) with 6 daemon(s) not managed by cephadm
>             2 devices (4 osds) down
>             4 osds down
>             1 host (4 osds) down
>             1 root (4 osds) down
>             Reduced data availability: 129 pgs inactive
>
>   services:
>     mon: 1 daemons, quorum bjorn (age 8m)
>     mgr: bjorn(active, since 8m)
>     osd: 4 osds: 0 up (since 4w), 4 in (since 4w)
>
>   data:
>     pools:   2 pools, 129 pgs
>     objects: 0 objects, 0 B
>     usage:   1.8 TiB used, 1.8 TiB / 3.6 TiB avail
>     pgs:     100.000% pgs unknown
>              129 unknown
>
> I can see some network communication between the OSDs and the monitor and
> the OSDs are running:
>
> # docker ps -a
> CONTAINER ID   IMAGE                   COMMAND                  CREATED
>       STATUS          PORTS     NAMES
> f8fbe8177a63   quay.io/ceph/ceph:v17   "/usr/bin/ceph-osd -…"   9 minutes
> ago    Up 9 minutes
> ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-2
> 6768ec871404   quay.io/ceph/ceph:v17   "/usr/bin/ceph-osd -…"   9 minutes
> ago    Up 9 minutes
> ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-1
> ff82f84504d5   quay.io/ceph/ceph:v17   "/usr/bin/ceph-osd -…"   9 minutes
> ago    Up 9 minutes
> ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-0
> 4c89e50ce974   quay.io/ceph/ceph:v17   "/usr/bin/ceph-osd -…"   9 minutes
> ago    Up 9 minutes
> ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-3
> fe0b6089edda   quay.io/ceph/ceph:v17   "/usr/bin/ceph-mon -…"   9 minutes
> ago    Up 9 minutes
> ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-mon-bjorn
> f76ac9dcdd6d   quay.io/ceph/ceph:v17   "/usr/bin/ceph-mgr -…"   9 minutes
> ago    Up 9 minutes
> ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-mgr-bjorn
>
> However when I try to use any `ceph orch` commands, they hang. I can also
> see some blacklist on the OSDs:
>
> # ceph osd blocklist ls
> 10.99.0.13:6833/3770763474 2023-04-13T08:17:38.885128+0000
> 10.99.0.13:6832/3770763474 2023-04-13T08:17:38.885128+0000
> 10.99.0.13:0/2634718754 2023-04-13T08:17:38.885128+0000
> 10.99.0.13:0/1103315748 2023-04-13T08:17:38.885128+0000
> listed 4 entries
>
> The first two entries correspond to the manager process. `ceph osd
> blocked-by` does not show anything.
>
> I think I might have forgotten to set the `ceph osd require-osd-release
> ...` because 14 is written in
> `/var/lib/ceph/<ID>/osd.?/require_osd_release`. If I try to do it now, the
> monitor hits an abort:
>
> debug      0> 2023-04-12T08:43:27.788+0000 7f0fcf2aa700 -1 *** Caught
> signal (Aborted) **
>  in thread 7f0fcf2aa700 thread_name:ms_dispatch
>  ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
> (stable)
>  1: /lib64/libpthread.so.0(+0x12cf0) [0x7f0fd94bbcf0]
>  2: gsignal()
>  3: abort()
>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x18f) [0x7f0fdb5124e3]
>  5: /usr/lib64/ceph/libceph-common.so.2(+0x26a64f) [0x7f0fdb51264f]
>  6: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>,
> std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >, boost::variant<std::__cxx11::basi
>  7:
> (OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>)+0x38d)
> [0x562719cb127d]
>  8: (OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x17b)
> [0x562719cb18cb]
>  9: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x2ce)
> [0x562719c20ade]
>  10: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x1ebb)
> [0x562719ab9f6b]
>  11: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x9f2)
> [0x562719abe152]
>  12: (Monitor::_ms_dispatch(Message*)+0x406) [0x562719abf066]
>  13: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5d)
> [0x562719aef13d]
>  14: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message>
> const&)+0x478) [0x7f0fdb78e0e8]
>  15: (DispatchQueue::entry()+0x50f) [0x7f0fdb78b52f]
>  16: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f0fdb8543b1]
>  17: /lib64/libpthread.so.0(+0x81ca) [0x7f0fd94b11ca]
>  18: clone()
>
> Any ideas on what is going on?
>
> Many thanks,
> Alexandre
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux