Thank you for confirmation. Hopefully it will be approved in bodhi (you can leave feedback here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-0eda4297eb to help it along) soon, and new docker images can be built with the older version. On Tue, Mar 9, 2021 at 10:57 AM Andrej Filipcic <andrej.filipcic@xxxxxx> wrote: > > > just confirming, crashes are gone with gperftools-libs-2.7-8.el8.x86_64.rpm > > Cheers, > Andrej > > On 09/03/2021 16:52, Andrej Filipcic wrote: > > > > Hi, > > > > I was checking that bug yesterday, yes, and it smells the same. > > > > I will give a try to the epel one, > > > > Thanks > > Andrej > > > > On 09/03/2021 16:44, Dan van der Ster wrote: > >> Hi Andrej, > >> > >> I wonder if this is another manifestation of the buggy gperftools-libs > >> v2.8 bug, e.g. https://tracker.ceph.com/issues/49618 > >> > >> If so, there is a fixed (downgraded) version in epel-testing now. > >> > >> Cheers, Dan > >> > >> > >> > >> > >> On Tue, Mar 9, 2021 at 4:36 PM Andrej Filipcic > >> <andrej.filipcic@xxxxxx> wrote: > >>> > >>> Hi, > >>> > >>> under heavy load our cluster is experiencing frequent OSD crashes. Is > >>> this a known bug or should I report it? Any workarounds? It looks to be > >>> highly correlated with memory tuning. > >>> > >>> it happens with both nautilus 14.2.16 and octopus 15.2.9. I have forced > >>> the bitmap bluefs and bluestore allocator. > >>> > >>> the cluster is ~60 nodes with 256GB ram and 25Gb NICs, ~1600 OSDs on > >>> 100g network. Typically it is happening when the traffic is above > >>> 100GB/s. > >>> > >>> Best regards, > >>> Andrej > >>> > >>> > >>> -14> 2021-03-09T14:10:30.105+0100 7fc128e05700 10 monclient: tick > >>> -13> 2021-03-09T14:10:30.105+0100 7fc128e05700 10 monclient: > >>> _check_auth_rotating have uptodate secrets (they expire after > >>> 2021-03-09T14:10:00.107344+0100) > >>> -12> 2021-03-09T14:10:30.210+0100 7fc119412700 5 osd.209 9539 > >>> heartbeat osd_stat(store_statfs(0xe68762a0000/0x40000000/0xe8d7fc00000, > >>> data 0x24b195d923/0x24c97c0000, compress 0x0/0x0/0x0, omap 0xf5dca, > >>> meta > >>> 0x3ff0a236), peers > >>> [6,8,11,21,22,23,24,25,26,27,28,29,34,35,37,38,65,86,87,90,120,128,129,135,136,140,150,153,154,160,184,188,192,193,203,208,210,217,229,233,242,248,254,25 > >>> > >>> 6,275,277,282,290,311,313,324,326,331,339,348,369,409,411,413,466,477,532,535,538,539,542,544,546,548,552,554,556,558,561,576,580,600,601,604,612,614,624,625,631,657,689,695,704,717,738,739,740,766,790,810,812,833,839,890,891,895,903,909,916,926,927,946,960,965,991,1050,1055,1062,1064,1067,1069,1072,1073,1075,1077,1078,1079,1095,1100,1117,1125,1127,1141,1148,1149,1153,1155,1195,12 > >>> > >>> 01,1202,1215,1229,1238,1253,1260,1283,1290,1298,1303,1329,1330,1349,1350,1388,1389,1422,1423,1430,1431,1434,1448,1455,1478,1479,1485,1488,1494,1497,1506,1516,1561,1564,1573,1574,1580] > >>> > >>> op hist [0,0,0,1,3,4,15,24,43,64,102,117]) > >>> -11> 2021-03-09T14:10:30.468+0100 7fc137cc0700 10 monclient: > >>> handle_auth_request added challenge on 0x55c08becf800 > >>> -10> 2021-03-09T14:10:30.543+0100 7fc1374bf700 10 monclient: > >>> handle_auth_request added challenge on 0x55c08becf400 > >>> -9> 2021-03-09T14:10:30.712+0100 7fc1384c1700 10 monclient: > >>> handle_auth_request added challenge on 0x55c08becec00 > >>> -8> 2021-03-09T14:10:31.029+0100 7fc137cc0700 10 monclient: > >>> handle_auth_request added challenge on 0x55c08becfc00 > >>> -7> 2021-03-09T14:10:31.033+0100 7fc12ca33700 5 prioritycache > >>> tune_memory target: 7264711979 mapped: 1564606464 unmapped: 47874048 > >>> heap: 1612480512 old mem: 5369698813 new mem: 5369698813 > >>> -6> 2021-03-09T14:10:31.105+0100 7fc128e05700 10 monclient: tick > >>> -5> 2021-03-09T14:10:31.105+0100 7fc128e05700 10 monclient: > >>> _check_auth_rotating have uptodate secrets (they expire after > >>> 2021-03-09T14:10:01.107451+0100) > >>> -4> 2021-03-09T14:10:31.574+0100 7fc1374bf700 10 monclient: > >>> handle_auth_request added challenge on 0x55c0b69a8000 > >>> -3> 2021-03-09T14:10:32.036+0100 7fc12ca33700 5 prioritycache > >>> tune_memory target: 7264711979 mapped: 1708449792 unmapped: 46637056 > >>> heap: 1755086848 old mem: 5369698813 new mem: 5369698813 > >>> -2> 2021-03-09T14:10:32.106+0100 7fc128e05700 10 monclient: tick > >>> -1> 2021-03-09T14:10:32.106+0100 7fc128e05700 10 monclient: > >>> _check_auth_rotating have uptodate secrets (they expire after > >>> 2021-03-09T14:10:02.107524+0100) > >>> 0> 2021-03-09T14:10:32.661+0100 7fc1384c1700 -1 *** Caught > >>> signal > >>> (Aborted) ** > >>> in thread 7fc1384c1700 thread_name:msgr-worker-0 > >>> > >>> ceph version 15.2.9 (357616cbf726abb779ca75a551e8d02568e15b17) > >>> octopus > >>> (stable) > >>> 1: (()+0x12b20) [0x7fc13cc20b20] > >>> 2: (gsignal()+0x10f) [0x7fc13b8847ff] > >>> 3: (abort()+0x127) [0x7fc13b86ec35] > >>> 4: (()+0x9009b) [0x7fc13c23a09b] > >>> 5: (()+0x9653c) [0x7fc13c24053c] > >>> 6: (()+0x96597) [0x7fc13c240597] > >>> 7: (()+0x967f8) [0x7fc13c2407f8] > >>> 8: (ceph::buffer::v15_2_0::create_aligned_in_mempool(unsigned int, > >>> unsigned int, int)+0x229) [0x55c0669dce49] > >>> 9: (ceph::buffer::v15_2_0::create_aligned(unsigned int, unsigned > >>> int)+0x26) [0x55c0669dcf46] > >>> 10: (ceph::buffer::v15_2_0::create_small_page_aligned(unsigned > >>> int)+0x55) [0x55c0669dd8e5] > >>> 11: (ProtocolV1::read_message_data_prepare()+0x368) [0x55c066b78ac8] > >>> 12: (ProtocolV1::read_message_middle()+0x130) [0x55c066b78c90] > >>> 13: (ProtocolV1::handle_message_front(char*, int)+0x2f4) > >>> [0x55c066b79624] > >>> 14: (()+0xf72bed) [0x55c066b72bed] > >>> 15: (AsyncConnection::process()+0x8a9) [0x55c066b6fa39] > >>> 16: (EventCenter::process_events(unsigned int, > >>> std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >>> >*)+0xcb7) [0x55c0669c41b7] > >>> 17: (()+0xdc979c) [0x55c0669c979c] > >>> 18: (()+0xc2ba3) [0x7fc13c26cba3] > >>> 19: (()+0x814a) [0x7fc13cc1614a] > >>> 20: (clone()+0x43) [0x7fc13b949f23] > >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>> needed to interpret this. > >>> > >>> > >>> --- logging levels --- > >>> 0/ 5 none > >>> 0/ 1 lockdep > >>> 0/ 1 context > >>> 1/ 1 crush > >>> 1/ 5 mds > >>> 1/ 5 mds_balancer > >>> 1/ 5 mds_locker > >>> 1/ 5 mds_log > >>> 1/ 5 mds_log_expire > >>> 1/ 5 mds_migrator > >>> 0/ 1 buffer > >>> 0/ 1 timer > >>> 0/ 1 filer > >>> 0/ 1 striper > >>> 0/ 1 objecter > >>> 0/ 5 rados > >>> 0/ 5 rbd > >>> 0/ 5 rbd_mirror > >>> 0/ 5 rbd_replay > >>> 0/ 5 rbd_rwl > >>> 0/ 5 journaler > >>> 0/ 5 objectcacher > >>> 0/ 5 immutable_obj_cache > >>> 0/ 5 client > >>> 1/ 5 osd > >>> 0/ 5 optracker > >>> 0/ 5 objclass > >>> 1/ 3 filestore > >>> 1/ 3 journal > >>> 0/ 0 ms > >>> 1/ 5 mon > >>> 0/10 monc > >>> 1/ 5 paxos > >>> 0/ 5 tp > >>> 1/ 5 auth > >>> 1/ 5 crypto > >>> 1/ 1 finisher > >>> 1/ 1 reserver > >>> 1/ 5 heartbeatmap > >>> 1/ 5 perfcounter > >>> 1/ 5 rgw > >>> 1/ 5 rgw_sync > >>> 1/10 civetweb > >>> 1/ 5 javaclient > >>> 1/ 5 asok > >>> 1/ 1 throttle > >>> 0/ 0 refs > >>> 1/ 5 compressor > >>> 1/ 5 bluestore > >>> 1/ 5 bluefs > >>> 1/ 3 bdev > >>> 1/ 5 kstore > >>> 4/ 5 rocksdb > >>> 4/ 5 leveldb > >>> 4/ 5 memdb > >>> 1/ 5 fuse > >>> 1/ 5 mgr > >>> 1/ 5 mgrc > >>> 1/ 5 dpdk > >>> 1/ 5 eventtrace > >>> 1/ 5 prioritycache > >>> 0/ 5 test > >>> -2/-2 (syslog threshold) > >>> -1/-1 (stderr threshold) > >>> --- pthread ID / name mapping for recent threads --- > >>> 7fc119412700 / osd_srv_heartbt > >>> 7fc119c13700 / tp_osd_tp > >>> 7fc11a414700 / tp_osd_tp > >>> 7fc11ac15700 / tp_osd_tp > >>> 7fc11b416700 / tp_osd_tp > >>> 7fc11bc17700 / tp_osd_tp > >>> 7fc124c29700 / ms_dispatch > >>> 7fc125c2b700 / rocksdb:dump_st > >>> 7fc12822a700 / bstore_kv_sync > >>> 7fc128e05700 / safe_timer > >>> 7fc129e07700 / ms_dispatch > >>> 7fc12ca33700 / bstore_mempool > >>> 7fc133446700 / safe_timer > >>> 7fc1374bf700 / msgr-worker-2 > >>> 7fc137cc0700 / msgr-worker-1 > >>> 7fc1384c1700 / msgr-worker-0 > >>> max_recent 10000 > >>> max_new 1000 > >>> > >>> -- > >>> _____________________________________________________________ > >>> prof. dr. Andrej Filipcic, E-mail: Andrej.Filipcic@xxxxxx > >>> Department of Experimental High Energy Physics - F9 > >>> Jozef Stefan Institute, Jamova 39, P.o.Box 3000 > >>> SI-1001 Ljubljana, Slovenia > >>> Tel.: +386-1-477-3674 Fax: +386-1-425-7074 > >>> ------------------------------------------------------------- > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > -- > _____________________________________________________________ > prof. dr. Andrej Filipcic, E-mail: Andrej.Filipcic@xxxxxx > Department of Experimental High Energy Physics - F9 > Jozef Stefan Institute, Jamova 39, P.o.Box 3000 > SI-1001 Ljubljana, Slovenia > Tel.: +386-1-477-3674 Fax: +386-1-425-7074 > ------------------------------------------------------------- > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx