Don’t worry about the number of emails, I believe everyone running
CephFS is rooting for you right now, hoping to eventually learn how to
get out of that situation. :-)
Zitat von Frank Schilder <frans@xxxxxx>:
Hi Dan,
forget what I wrote. I forgot the "-a" option for ulimit. Its still
limited to 1024. I'm too tired to start a new test now. I will
report back tomorrow afternoon/evening.
Thanks for your hint and sorry for the many mails.
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: Tuesday, January 14, 2025 9:11 PM
To: Dan van der Ster
Cc: ceph-users@xxxxxxx
Subject: Re: MDS crashing on startup
Hi Dan,
celebrating too early. Applying our tuned profile results in:
# sudo -u ceph ulimit
unlimited
# sysctl fs.file-max
fs.file-max = 26234859
Still, the MDS aborts in exactly the same way:
-88> 2025-01-14T14:57:54.511-0500 7f8a88613700 0
log_channel(cluster) log [DBG] : reconnect by client.426286062
v1:192.168.58.69:0/550867185 after 1.01109
-87> 2025-01-14T14:57:54.511-0500 7f8a8be1a700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.15/rpm/el8/BUILD/ceph-16.2.15/src/msg/async/AsyncMessenger.cc: In function 'void Processor::accept()' thread 7f8a8be1a700 time
2025-01-14T14:57:54.510412-0500
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.15/rpm/el8/BUILD/ceph-16.2.15/src/msg/async/AsyncMessenger.cc: 214: ceph_abort_msg("abort()
called")
ceph version 16.2.15 (618f440892089921c3e944a991122ddc44e60516)
pacific (stable)
1: (ceph::__ceph_abort(char const*, int, char const*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+0xe5) [0x7f8a91068904]
2: (Processor::accept()+0x862) [0x7f8a9135b502]
3: (EventCenter::process_events(unsigned int,
std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l>
>*)+0xcb7) [0x7f8a913b0b87]
4: /usr/lib64/ceph/libceph-common.so.2(+0x5c90bc) [0x7f8a913b70bc]
5: /lib64/libstdc++.so.6(+0xc2b23) [0x7f8a8f47ab23]
6: /lib64/libpthread.so.0(+0x81ca) [0x7f8a900451ca]
7: clone()
-86> 2025-01-14T14:57:54.511-0500 7f8a88613700 0
log_channel(cluster) log [DBG] : reconnect by client.425227337
v1:192.168.57.49:0/394329910 after 1.01109
[...]
-1> 2025-01-14T14:57:54.511-0500 7f8a88613700 0
log_channel(cluster) log [DBG] : reconnect by client.425644021
v1:192.168.58.8:0/3860392786 after 1.01109
0> 2025-01-14T14:57:54.512-0500 7f8a8be1a700 -1 *** Caught
signal (Aborted) **
in thread 7f8a8be1a700 thread_name:msgr-worker-0
ceph version 16.2.15 (618f440892089921c3e944a991122ddc44e60516)
pacific (stable)
1: /lib64/libpthread.so.0(+0x12d10) [0x7f8a9004fd10]
2: gsignal()
3: abort()
4: (ceph::__ceph_abort(char const*, int, char const*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+0x1b6) [0x7f8a910689d5]
5: (Processor::accept()+0x862) [0x7f8a9135b502]
6: (EventCenter::process_events(unsigned int,
std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l>
>*)+0xcb7) [0x7f8a913b0b87]
7: /usr/lib64/ceph/libceph-common.so.2(+0x5c90bc) [0x7f8a913b70bc]
8: /lib64/libstdc++.so.6(+0xc2b23) [0x7f8a8f47ab23]
9: /lib64/libpthread.so.0(+0x81ca) [0x7f8a900451ca]
10: clone()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
Are there other places where abort is called? Could it be a signal
from another process?
Thanks for helping!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx