Re: after update to 14.2.16 osd daemons begin to crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Igor,

this is good news for me. Do you have an idea in which version the fix will
be released and can you tell me how I can track if the fix is in the
release?

I will read a bit about the allocators but I doubt we will do the switch
and just wait it out (if it does not take a year) :)

Thank you a lot.

Am Mi., 17. Feb. 2021 um 10:59 Uhr schrieb Igor Fedotov <ifedotov@xxxxxxx>:

> Hi Boris,
>
> highly likely you've faced https://tracker.ceph.com/issues/47751
>
> It's fixed in upcoming Nautilus release but v14.2.16 still lacks the fix.
>
> As a workaround you might want to switch back to bitmap or avl allocator.
>
> Thanks,
>
> Igor
>
>
> On 2/17/2021 12:36 PM, Boris Behrens wrote:
> > Hi,
> >
> > currently we experience osd daemon crashes and I can't pin the issue. I
> > hope someone can help me with it.
> >
> > * We operate multiple cluster (440 SSD - 1PB, 36 SSD - 126TB, 40SSD
> 100TB,
> > 84HDD - 680TB)
> > * All clusters were updated around the same time (2021-02-03)
> > * We restarted ALL ceph daemons (systemctl restart ceph.target) on
> > 2021-02-11 after we added OOMScoreAdjust=-900 the all service files.
> >
> > now in our main cluster (440SSD with 1PB) the OSD daemons begin to crash:
> > # ceph crash ls
> > ID                                                               ENTITY
> NEW
> > 2020-03-06_17:37:54.031675Z_0bbbb807-ff2f-46df-9508-58d319b89bd6 osd.397
> > 2020-05-28_12:23:27.677741Z_061f2449-9a36-4747-a2f8-624e72cd1ad0 osd.410
> > 2021-02-05_07:03:35.943384Z_dffab245-4788-4de2-a677-76b735d5fc01 osd.403
> > 2021-02-15_15:41:27.934194Z_97b57f8f-58f2-4390-9d3e-993874e0e000 osd.395
> > 2021-02-15_18:01:19.774879Z_18160e65-4659-451f-8aae-def2984f1f29 osd.178
> > 2021-02-17_04:51:05.101052Z_9f04c6e8-d0c7-442c-9a38-33d5164d2a83 osd.384
> >
> > osd.384 and osd.395 are on the same node, which had some memory issues we
> > fixed 2021-02-16_12:00:00
> >
> > osd.384 was marked as out for >24h when the daemon crashed, and there no
> > more misplaced objects in the cluster.
> >
> > Here is the latest crash dump
> >   --- begin dump of recent events ---
> >   -9999> 2021-02-17 03:31:31.305 7fcf7e136700  1 do_command 'perf dump'
> > 'result is 30067 bytes
> >   -9998> 2021-02-17 03:31:31.626 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >   -9997> 2021-02-17 03:31:32.634 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >   -9996> 2021-02-17 03:31:33.639 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >   -9995> 2021-02-17 03:31:34.647 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >   -9994> 2021-02-17 03:31:35.651 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >   -9993> 2021-02-17 03:31:36.654 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >   -9992> 2021-02-17 03:31:37.657 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >   -9991> 2021-02-17 03:31:38.676 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >   -9990> 2021-02-17 03:31:39.680 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >   -9989> 2021-02-17 03:31:40.684 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2882789376 unmapped: 956792832 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >   -9988> 2021-02-17 03:31:41.193 7fcf7e136700  1 do_command 'perf dump' '
> >   -9987> 2021-02-17 03:31:41.193 7fcf7e136700  1 do_command 'perf dump'
> > 'result is 30067 bytes
> >
> > <snip>
> >
> >     -31> 2021-02-17 05:50:41.158 7fcf7e136700  1 do_command 'perf dump' '
> >     -30> 2021-02-17 05:50:41.159 7fcf7e136700  1 do_command 'perf dump'
> > 'result is 30070 bytes
> >     -29> 2021-02-17 05:50:41.804 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851831808 unmapped: 987750400 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -28> 2021-02-17 05:50:42.813 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851831808 unmapped: 987750400 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -27> 2021-02-17 05:50:43.820 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -26> 2021-02-17 05:50:44.825 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -25> 2021-02-17 05:50:45.831 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -24> 2021-02-17 05:50:46.837 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -23> 2021-02-17 05:50:47.840 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -22> 2021-02-17 05:50:48.843 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -21> 2021-02-17 05:50:49.847 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -20> 2021-02-17 05:50:50.853 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -19> 2021-02-17 05:50:51.524 7fcf7e136700  1 do_command 'perf dump' '
> >     -18> 2021-02-17 05:50:51.525 7fcf7e136700  1 do_command 'perf dump'
> > 'result is 30070 bytes
> >     -17> 2021-02-17 05:50:51.859 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -16> 2021-02-17 05:50:52.862 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -15> 2021-02-17 05:50:53.871 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -14> 2021-02-17 05:50:54.875 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -13> 2021-02-17 05:50:55.886 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -12> 2021-02-17 05:50:56.891 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -11> 2021-02-17 05:50:57.905 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >     -10> 2021-02-17 05:50:58.911 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >      -9> 2021-02-17 05:50:59.917 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >      -8> 2021-02-17 05:51:00.929 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >      -7> 2021-02-17 05:51:01.566 7fcf7e136700  1 do_command 'perf dump' '
> >      -6> 2021-02-17 05:51:01.567 7fcf7e136700  1 do_command 'perf dump'
> > 'result is 30070 bytes
> >      -5> 2021-02-17 05:51:01.935 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >      -4> 2021-02-17 05:51:02.943 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851405824 unmapped: 988176384 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >      -3> 2021-02-17 05:51:03.949 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851102720 unmapped: 988479488 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >      -2> 2021-02-17 05:51:04.967 7fcf73be6700  5 prioritycache
> tune_memory
> > target: 4294967296 mapped: 2851102720 unmapped: 988479488 heap:
> 3839582208
> > old mem: 2845415832 new mem: 2845415832
> >      -1> 2021-02-17 05:51:05.091 7fcf743e7700 -1
> >
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/fastbmap_allocator_impl.h:
> > In function 'uint64_t AllocatorLevel02<T>::claim_free_to_right(uint64_t)
> > [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread
> > 7fcf743e7700 time 2021-02-17 05:51:04.998475
> >
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.16/rpm/el7/BUILD/ceph-14.2.16/src/os/bluestore/fastbmap_allocator_impl.h:
> > 572: FAILED ceph_assert(available >= allocated)
> >
> >   ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c)
> nautilus
> > (stable)
> >   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x14a) [0x561c84cc2c7d]
> >   2: (()+0x4d8e45) [0x561c84cc2e45]
> >   3: (HybridAllocator::_add_to_tree(unsigned long, unsigned long)+0x49e)
> > [0x561c853167de]
> >   4: (AvlAllocator::_release(interval_set<unsigned long,
> std::map<unsigned
> > long, unsigned long, std::less<unsigned long>,
> > std::allocator<std::pair<unsigned long const, unsigned long> > > >
> > const&)+0x60) [0x561c85310b20]
> >   5: (HybridAllocator::release(interval_set<unsigned long,
> std::map<unsigned
> > long, unsigned long, std::less<unsigned long>,
> > std::allocator<std::pair<unsigned long const, unsigned long> > > >
> > const&)+0x3a) [0x561c853143ca]
> >   6: (BlueStore::_txc_release_alloc(BlueStore::TransContext*)+0x5f)
> > [0x561c851ee83f]
> >   7: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x1be)
> > [0x561c8522f4ae]
> >   8: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0xaa)
> > [0x561c8522fe9a]
> >   9: (BlueStore::_kv_finalize_thread()+0x604) [0x561c85232ed4]
> >   10: (BlueStore::KVFinalizeThread::entry()+0xd) [0x561c852625ed]
> >   11: (()+0x7ea5) [0x7fcf840a2ea5]
> >   12: (clone()+0x6d) [0x7fcf82f6596d]
> >
> >       0> 2021-02-17 05:51:05.145 7fcf743e7700 -1 *** Caught signal
> (Aborted)
> > **
> >   in thread 7fcf743e7700 thread_name:bstore_kv_final
> >
> >   ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c)
> nautilus
> > (stable)
> >   1: (()+0xf630) [0x7fcf840aa630]
> >   2: (gsignal()+0x37) [0x7fcf82e9d387]
> >   3: (abort()+0x148) [0x7fcf82e9ea78]
> >   4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x199) [0x561c84cc2ccc]
> >   5: (()+0x4d8e45) [0x561c84cc2e45]
> >   6: (HybridAllocator::_add_to_tree(unsigned long, unsigned long)+0x49e)
> > [0x561c853167de]
> >   7: (AvlAllocator::_release(interval_set<unsigned long,
> std::map<unsigned
> > long, unsigned long, std::less<unsigned long>,
> > std::allocator<std::pair<unsigned long const, unsigned long> > > >
> > const&)+0x60) [0x561c85310b20]
> >   8: (HybridAllocator::release(interval_set<unsigned long,
> std::map<unsigned
> > long, unsigned long, std::less<unsigned long>,
> > std::allocator<std::pair<unsigned long const, unsigned long> > > >
> > const&)+0x3a) [0x561c853143ca]
> >   9: (BlueStore::_txc_release_alloc(BlueStore::TransContext*)+0x5f)
> > [0x561c851ee83f]
> >   10: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x1be)
> > [0x561c8522f4ae]
> >   11: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0xaa)
> > [0x561c8522fe9a]
> >   12: (BlueStore::_kv_finalize_thread()+0x604) [0x561c85232ed4]
> >   13: (BlueStore::KVFinalizeThread::entry()+0xd) [0x561c852625ed]
> >   14: (()+0x7ea5) [0x7fcf840a2ea5]
> >   15: (clone()+0x6d) [0x7fcf82f6596d]
> >   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed
> > to interpret this.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux