Re: osd crash when using rdma

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Did you add the configuration directly to the conf?
I see that other people's posts need to be recompiled after adding rdma.
I'm also going to try rdma mode now, but haven't found any more info.

sascha a. <sascha.arthur@xxxxxxxxx> 于2022年2月1日周二 20:31写道:

> Hey,
>
> I Recently found this RDMA feature of ceph. Which I'm currently trying out.
>
> #rdma dev
> 0: mlx4_0: node_type ca fw 2.42.5000 node_guid 0010:e000:0189:1984
> sys_image_guid 0010:e000:0189:1987
>
> rdma_server and rdma_ping works as well as "udaddy".
>
> Stopped one of my osds, added following lines to ceph.conf
>
>     ms_type = async+rdma
>     ms_cluster_type = async+rdma
>     ms_async_rdma_device_name = mlx4_0
>     ms_async_rdma_polling_us = 0
>
> restarted the ceph osd and it crashes with:
>
> ** Caught signal (Segmentation fault) **
>  in thread 7f2c8dc132c0 thread_name:ceph-osd
>  ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific
> (stable)
>  1: /lib/x86_64-linux-gnu/libc.so.6(+0x46520) [0x7f2c8de65520]
>  2: (RDMAStack::RDMAStack(ceph::common::CephContext*)+0x1c3)
> [0x5648024abdf3]
>  3: (NetworkStack::create(ceph::common::CephContext*,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&)+0x298) [0x5648024a3ae8]
>  4: (AsyncMessenger::AsyncMessenger(ceph::common::CephContext*,
> entity_name_t, std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >, unsigned long)+0x796)
> [0x564802496ed6]
>  5: (Messenger::create(ceph::common::CephContext*,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, entity_name_t,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >, unsigned long)+0xd8) [0x564802489f98]
>  6: (Messenger::create_client_messenger(ceph::common::CephContext*,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >)+0x168) [0x56480248a3a8]
>  7: (MonClient::get_monmap_and_config()+0x138) [0x5648024f0048]
>  8: (global_init(std::map<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> std::allocator<std::pair<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > > > > const*, std::vector<char const*,
> std::allocator<char const*> >&, unsigned int, code_environment_t, int,
> bool)+0x45f) [0x5648022abbef]
>  9: main()
>  10: /lib/x86_64-linux-gnu/libc.so.6(+0x2dfd0) [0x7f2c8de4cfd0]
>  11: __libc_start_main()
>  12: _start()
> 2022-02-01T12:21:56.261+0000 7f2c8dc132c0 -1 *** Caught signal
> (Segmentation fault) **
>  in thread 7f2c8dc132c0 thread_name:ceph-osd
>
>  ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific
> (stable)
>  1: /lib/x86_64-linux-gnu/libc.so.6(+0x46520) [0x7f2c8de65520]
>  2: (RDMAStack::RDMAStack(ceph::common::CephContext*)+0x1c3)
> [0x5648024abdf3]
>  3: (NetworkStack::create(ceph::common::CephContext*,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&)+0x298) [0x5648024a3ae8]
>  4: (AsyncMessenger::AsyncMessenger(ceph::common::CephContext*,
> entity_name_t, std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >, unsigned long)+0x796)
> [0x564802496ed6]
>  5: (Messenger::create(ceph::common::CephContext*,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, entity_name_t,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >, unsigned long)+0xd8) [0x564802489f98]
>  6: (Messenger::create_client_messenger(ceph::common::CephContext*,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >)+0x168) [0x56480248a3a8]
>  7: (MonClient::get_monmap_and_config()+0x138) [0x5648024f0048]
>  8: (global_init(std::map<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> std::allocator<std::pair<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > > > > const*, std::vector<char const*,
> std::allocator<char const*> >&, unsigned int, code_environment_t, int,
> bool)+0x45f) [0x5648022abbef]
>  9: main()
>  10: /lib/x86_64-linux-gnu/libc.so.6(+0x2dfd0) [0x7f2c8de4cfd0]
>  11: __libc_start_main()
>  12: _start()
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>    -43> 2022-02-01T12:21:56.261+0000 7f2c8dc132c0 -1 *** Caught signal
> (Segmentation fault) **
>
> Heres the gdb debug output
>
> Thread 1 "ceph-osd" received signal SIGSEGV, Segmentation fault.
> 0x00005555563c9df3 in RDMAStack::RDMAStack(ceph::common::CephContext*) ()
> (gdb) backtrace
> #0  0x00005555563c9df3 in RDMAStack::RDMAStack(ceph::common::CephContext*)
> ()
> #1  0x00005555563c1ae8 in NetworkStack::create(ceph::common::CephContext*,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&) ()
> #2  0x00005555563b4ed6 in
> AsyncMessenger::AsyncMessenger(ceph::common::CephContext*, entity_name_t,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >, unsigned long) ()
> #3  0x00005555563a7f98 in Messenger::create(ceph::common::CephContext*,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > const&, entity_name_t,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >, unsigned long) ()
> #4  0x00005555563a83a8 in
> Messenger::create_client_messenger(ceph::common::CephContext*,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >) ()
> #5  0x000055555640e048 in MonClient::get_monmap_and_config() ()
> #6  0x00005555561c9bef in
> global_init(std::map<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> std::allocator<std::pair<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > > > > const*, std::vector<char const*,
> std::allocator<char const*> >&, unsigned int, code_environment_t, int,
> bool) ()
> #7  0x0000555555ae299b in main ()
>
> Using ceph 16.2.6 on ubuntu 21.10 impish ... anyone run into this problem?
>
> Thanks!
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux