after update to 14.2.16 osd daemons begin to crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

currently we experience osd daemon crashes and I can't pin the issue. I
hope someone can help me with it.

* We operate multiple cluster (440 SSD - 1PB, 36 SSD - 126TB, 40SSD 100TB,
84HDD - 680TB)
* All clusters were updated around the same time (2021-02-03)
* We restarted ALL ceph daemons (systemctl restart ceph.target) on
2021-02-11 after we added OOMScoreAdjust=-900 the all service files.

now in our main cluster (440SSD with 1PB) the OSD daemons begin to crash:
# ceph crash ls
ID                                                               ENTITY  NEW
2020-03-06_17:37:54.031675Z_0bbbb807-ff2f-46df-9508-58d319b89bd6 osd.397
2020-05-28_12:23:27.677741Z_061f2449-9a36-4747-a2f8-624e72cd1ad0 osd.410
2021-02-05_07:03:35.943384Z_dffab245-4788-4de2-a677-76b735d5fc01 osd.403
2021-02-15_15:41:27.934194Z_97b57f8f-58f2-4390-9d3e-993874e0e000 osd.395
2021-02-15_18:01:19.774879Z_18160e65-4659-451f-8aae-def2984f1f29 osd.178
2021-02-17_04:51:05.101052Z_9f04c6e8-d0c7-442c-9a38-33d5164d2a83 osd.384

osd.384 and osd.395 are on the same node, which had some memory issues we
fixed 2021-02-16_12:00:00

osd.384 was marked as out for >24h when the daemon crashed, and there no
more misplaced objects in the cluster.

Here is the latest crash dump
 --- begin dump of recent events ---
 -9999> 2021-02-17 03:31:31.305 7fcf7e136700  1 do_command 'perf dump'
'result is 30067 bytes
 -9998> 2021-02-17 03:31:31.626 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
 -9997> 2021-02-17 03:31:32.634 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
 -9996> 2021-02-17 03:31:33.639 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
 -9995> 2021-02-17 03:31:34.647 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
 -9994> 2021-02-17 03:31:35.651 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
 -9993> 2021-02-17 03:31:36.654 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
 -9992> 2021-02-17 03:31:37.657 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
 -9991> 2021-02-17 03:31:38.676 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
 -9990> 2021-02-17 03:31:39.680 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
 -9989> 2021-02-17 03:31:40.684 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
 -9988> 2021-02-17 03:31:41.193 7fcf7e136700  1 do_command 'perf dump' '
 -9987> 2021-02-17 03:31:41.193 7fcf7e136700  1 do_command 'perf dump'
'result is 30067 bytes

<snip>

   -31> 2021-02-17 05:50:41.158 7fcf7e136700  1 do_command 'perf dump' '
   -30> 2021-02-17 05:50:41.159 7fcf7e136700  1 do_command 'perf dump'
'result is 30070 bytes
   -29> 2021-02-17 05:50:41.804 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851831808 unmapped: 987750400 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -28> 2021-02-17 05:50:42.813 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851831808 unmapped: 987750400 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -27> 2021-02-17 05:50:43.820 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -26> 2021-02-17 05:50:44.825 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -25> 2021-02-17 05:50:45.831 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -24> 2021-02-17 05:50:46.837 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -23> 2021-02-17 05:50:47.840 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -22> 2021-02-17 05:50:48.843 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -21> 2021-02-17 05:50:49.847 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -20> 2021-02-17 05:50:50.853 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -19> 2021-02-17 05:50:51.524 7fcf7e136700  1 do_command 'perf dump' '
   -18> 2021-02-17 05:50:51.525 7fcf7e136700  1 do_command 'perf dump'
'result is 30070 bytes
   -17> 2021-02-17 05:50:51.859 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -16> 2021-02-17 05:50:52.862 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -15> 2021-02-17 05:50:53.871 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -14> 2021-02-17 05:50:54.875 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -13> 2021-02-17 05:50:55.886 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -12> 2021-02-17 05:50:56.891 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -11> 2021-02-17 05:50:57.905 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
   -10> 2021-02-17 05:50:58.911 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
    -9> 2021-02-17 05:50:59.917 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
    -8> 2021-02-17 05:51:00.929 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
    -7> 2021-02-17 05:51:01.566 7fcf7e136700  1 do_command 'perf dump' '
    -6> 2021-02-17 05:51:01.567 7fcf7e136700  1 do_command 'perf dump'
'result is 30070 bytes
    -5> 2021-02-17 05:51:01.935 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
    -4> 2021-02-17 05:51:02.943 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
    -3> 2021-02-17 05:51:03.949 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851102720 unmapped: 988479488 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
    -2> 2021-02-17 05:51:04.967 7fcf73be6700  5 prioritycache tune_memory
target: 4294967296 mapped: 2851102720 unmapped: 988479488 heap: 3839582208
old mem: 2845415832 new mem: 2845415832
    -1> 2021-02-17 05:51:05.091 7fcf743e7700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/fastbmap_allocator_impl.h:
In function 'uint64_t AllocatorLevel02<T>::claim_free_to_right(uint64_t)
[with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread
7fcf743e7700 time 2021-02-17 05:51:04.998475
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/fastbmap_allocator_impl.h:
572: FAILED ceph_assert(available >= allocated)

 ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14a) [0x561c84cc2c7d]
 2: (()+0x4d8e45) [0x561c84cc2e45]
 3: (HybridAllocator::_add_to_tree(unsigned long, unsigned long)+0x49e)
[0x561c853167de]
 4: (AvlAllocator::_release(interval_set<unsigned long, std::map<unsigned
long, unsigned long, std::less<unsigned long>,
std::allocator<std::pair<unsigned long const, unsigned long> > > >
const&)+0x60) [0x561c85310b20]
 5: (HybridAllocator::release(interval_set<unsigned long, std::map<unsigned
long, unsigned long, std::less<unsigned long>,
std::allocator<std::pair<unsigned long const, unsigned long> > > >
const&)+0x3a) [0x561c853143ca]
 6: (BlueStore::_txc_release_alloc(BlueStore::TransContext*)+0x5f)
[0x561c851ee83f]
 7: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x1be)
[0x561c8522f4ae]
 8: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0xaa)
[0x561c8522fe9a]
 9: (BlueStore::_kv_finalize_thread()+0x604) [0x561c85232ed4]
 10: (BlueStore::KVFinalizeThread::entry()+0xd) [0x561c852625ed]
 11: (()+0x7ea5) [0x7fcf840a2ea5]
 12: (clone()+0x6d) [0x7fcf82f6596d]

     0> 2021-02-17 05:51:05.145 7fcf743e7700 -1 *** Caught signal (Aborted)
**
 in thread 7fcf743e7700 thread_name:bstore_kv_final

 ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
(stable)
 1: (()+0xf630) [0x7fcf840aa630]
 2: (gsignal()+0x37) [0x7fcf82e9d387]
 3: (abort()+0x148) [0x7fcf82e9ea78]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x199) [0x561c84cc2ccc]
 5: (()+0x4d8e45) [0x561c84cc2e45]
 6: (HybridAllocator::_add_to_tree(unsigned long, unsigned long)+0x49e)
[0x561c853167de]
 7: (AvlAllocator::_release(interval_set<unsigned long, std::map<unsigned
long, unsigned long, std::less<unsigned long>,
std::allocator<std::pair<unsigned long const, unsigned long> > > >
const&)+0x60) [0x561c85310b20]
 8: (HybridAllocator::release(interval_set<unsigned long, std::map<unsigned
long, unsigned long, std::less<unsigned long>,
std::allocator<std::pair<unsigned long const, unsigned long> > > >
const&)+0x3a) [0x561c853143ca]
 9: (BlueStore::_txc_release_alloc(BlueStore::TransContext*)+0x5f)
[0x561c851ee83f]
 10: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x1be)
[0x561c8522f4ae]
 11: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0xaa)
[0x561c8522fe9a]
 12: (BlueStore::_kv_finalize_thread()+0x604) [0x561c85232ed4]
 13: (BlueStore::KVFinalizeThread::entry()+0xd) [0x561c852625ed]
 14: (()+0x7ea5) [0x7fcf840a2ea5]
 15: (clone()+0x6d) [0x7fcf82f6596d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux