Re: MDS crashing on startup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan,

forget what I wrote. I forgot the "-a" option for ulimit. Its still limited to 1024. I'm too tired to start a new test now. I will report back tomorrow afternoon/evening.

Thanks for your hint and sorry for the many mails.
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: Tuesday, January 14, 2025 9:11 PM
To: Dan van der Ster
Cc: ceph-users@xxxxxxx
Subject:  Re: MDS crashing on startup

Hi Dan,

celebrating too early. Applying our tuned profile results in:

# sudo -u ceph ulimit
unlimited
# sysctl fs.file-max
fs.file-max = 26234859

Still, the MDS aborts in exactly the same way:

   -88> 2025-01-14T14:57:54.511-0500 7f8a88613700  0 log_channel(cluster) log [DBG] : reconnect by client.426286062 v1:192.168.58.69:0/550867185 after 1.01109
   -87> 2025-01-14T14:57:54.511-0500 7f8a8be1a700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.15/rpm/el8/BUILD/ceph-16.2.15/src/msg/async/AsyncMessenger.cc: In function 'void Processor::accept()' thread 7f8a8be1a700 time 2025-01-14T14:57:54.510412-0500
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.15/rpm/el8/BUILD/ceph-16.2.15/src/msg/async/AsyncMessenger.cc: 214: ceph_abort_msg("abort() called")

 ceph version 16.2.15 (618f440892089921c3e944a991122ddc44e60516) pacific (stable)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x7f8a91068904]
 2: (Processor::accept()+0x862) [0x7f8a9135b502]
 3: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xcb7) [0x7f8a913b0b87]
 4: /usr/lib64/ceph/libceph-common.so.2(+0x5c90bc) [0x7f8a913b70bc]
 5: /lib64/libstdc++.so.6(+0xc2b23) [0x7f8a8f47ab23]
 6: /lib64/libpthread.so.0(+0x81ca) [0x7f8a900451ca]
 7: clone()

   -86> 2025-01-14T14:57:54.511-0500 7f8a88613700  0 log_channel(cluster) log [DBG] : reconnect by client.425227337 v1:192.168.57.49:0/394329910 after 1.01109
[...]
    -1> 2025-01-14T14:57:54.511-0500 7f8a88613700  0 log_channel(cluster) log [DBG] : reconnect by client.425644021 v1:192.168.58.8:0/3860392786 after 1.01109
     0> 2025-01-14T14:57:54.512-0500 7f8a8be1a700 -1 *** Caught signal (Aborted) **
 in thread 7f8a8be1a700 thread_name:msgr-worker-0

 ceph version 16.2.15 (618f440892089921c3e944a991122ddc44e60516) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12d10) [0x7f8a9004fd10]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b6) [0x7f8a910689d5]
 5: (Processor::accept()+0x862) [0x7f8a9135b502]
 6: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xcb7) [0x7f8a913b0b87]
 7: /usr/lib64/ceph/libceph-common.so.2(+0x5c90bc) [0x7f8a913b70bc]
 8: /lib64/libstdc++.so.6(+0xc2b23) [0x7f8a8f47ab23]
 9: /lib64/libpthread.so.0(+0x81ca) [0x7f8a900451ca]
 10: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Are there other places where abort is called? Could it be a signal from another process?

Thanks for helping!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux