Yes. Please open a ticket!
On Sat, Sep 9, 2017 at 11:16 AM Eric Eastman <eric.eastman@xxxxxxxxxxxxxx> wrote:
_______________________________________________I am seeing OOM issues with some of my OSD nodes that I am testing with Bluestore on 12.2.0, so I decided to try the StupidAllocator to see if it has a smaller memory footprint, by setting the following in my ceph.conf:bluefs_allocator = stupidbluestore_cache_size_hdd = 1073741824bluestore_cache_size_ssd = 1073741824With these settings I am no longer seeing OOM errors, but on the node with these setting, overnight I have seen multiple Aborted messages in my log files:grep Abort *logceph-osd.10.log:2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) **ceph-osd.10.log: 0> 2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) **ceph-osd.11.log:2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** Caught signal (Aborted) **ceph-osd.11.log: 0> 2017-09-09 11:39:16.835793 7fdcf6b08700 -1 *** Caught signal (Aborted) **ceph-osd.3.log:2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal (Aborted) **ceph-osd.3.log:2017-09-09 07:49:56.256899 7f89edf90700 -1 *** Caught signal (Aborted) **ceph-osd.3.log: 0> 2017-09-09 07:49:56.256899 7f89edf90700 -1 *** Caught signal (Aborted) **ceph-osd.3.log:2017-09-09 08:13:16.919887 7f82f315e700 -1 *** Caught signal (Aborted) **ceph-osd.7.log:2017-09-09 09:19:17.281950 7f77824cf700 -1 *** Caught signal (Aborted) **ceph-osd.7.log: 0> 2017-09-09 09:19:17.281950 7f77824cf700 -1 *** Caught signal (Aborted) **Before I open a ticket, I just want to know if the StupidAllocator is supported in Luminous.A couple of examples of the Aborts are:2017-09-09 12:39:27.044074 7f27f5f20700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1504975167035909, "job": 86, "event": "flush_started", "num_memtables": 1, "num_entries": 1015543, "num_deletes": 345553, "memory_usage": 260049176}2017-09-09 12:39:27.044088 7f27f5f20700 4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 86] Level-0 flush table #1825: started2017-09-09 12:39:28.234651 7f27fff34700 -1 osd.10 pg_epoch: 3521 pg[1.3c7( v 3521'372186 (3456'369135,3521'372186] local-lis/les=3488/3490 n=2842 ec=578/66 lis/c 3488/3488 les/c/f 3490/3500/0 3488/3488/3477) [10,8,16] r=0 lpr=3488 crt=3521'372186 lcod 3521'372184 mlcod 3521'372184 active+clean+snaptrim snaptrimq=[111~2,115~2,13a~1,13c~3]] removing snap head2017-09-09 12:39:28.573034 7f2816f45700 -1 *** Caught signal (Aborted) **in thread 7f2816f45700 thread_name:msgr-worker-2ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)1: (()+0xa562f4) [0x5634e14882f4]2: (()+0x11390) [0x7f281b2c5390]3: (gsignal()+0x38) [0x7f281a261428]4: (abort()+0x16a) [0x7f281a26302a]5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f281aba384d]6: (()+0x8d6b6) [0x7f281aba16b6]7: (()+0x8d701) [0x7f281aba1701]8: (()+0xb8d38) [0x7f281abccd38]9: (()+0x76ba) [0x7f281b2bb6ba]10: (clone()+0x6d) [0x7f281a33282d]NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.--- begin dump of recent events ----10000> 2017-09-09 12:39:05.878006 7f2817746700 1 -- 172.16.2.133:6804/1327479 <== osd.2 172.16.2.131:6800/1710 37506 ==== osd_repop(mds.0.19:101159707 1.2f1 e3521/3477) v2 ==== 998+0+46 (52256346 0 1629833233) 0x56359eb29000 con 0x563510c02000-9999> 2017-09-09 12:39:05.878065 7f2816f45700 1 -- 10.15.2.133:6805/327479 <== mds.0 10.15.2.123:6800/2942775562 55580 ==== osd_op(mds.0.19:101159714 1.ec 1.ffad68ec (undecoded) ondisk+write+known_if_redirected+full_force e3521) v8 ==== 305+0+366 (2883828331 0 2609552142) 0x56355d9eb0c0 con 0x56355f455000Second example:2017-09-09 07:10:58.135527 7fa2d56a0700 4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:264] [default] [JOB 10] Flushing memtable with next log file: 27732017-09-09 07:10:58.262058 7fa2d56a0700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1504955458135538, "job": 10, "event": "flush_started", "num_memtables": 1, "num_entries": 935059, "num_deletes": 175946, "memory_usage": 260049888}2017-09-09 07:10:58.262077 7fa2d56a0700 4 rocksdb: [/build/ceph-12.2.0/src/rocksdb/db/flush_job.cc:293] [default] [JOB 10] Level-0 flush table #2774: started2017-09-09 07:10:58.565465 7fa2e96c8700 -1 *** Caught signal (Aborted) **in thread 7fa2e96c8700 thread_name:bstore_kv_syncceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)1: (()+0xa562f4) [0x5579585362f4]2: (()+0x11390) [0x7fa2faa45390]3: (gsignal()+0x38) [0x7fa2f99e1428]4: (abort()+0x16a) [0x7fa2f99e302a]5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7fa2fa32384d]6: (()+0x8d6b6) [0x7fa2fa3216b6]7: (()+0x8d701) [0x7fa2fa321701]8: (()+0x8d919) [0x7fa2fa321919]9: (()+0x1230f) [0x7fa2fb60b30f]10: (operator new[](unsigned long)+0x4e7) [0x7fa2fb62f4b7]11: (rocksdb::Arena::AllocateNewBlock(unsigned long)+0x70) [0x557958939150]12: (rocksdb::Arena::AllocateFallback(unsigned long, bool)+0x45) [0x5579589392d5]13: (rocksdb::Arena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)+0x100) [0x557958939460]14: (rocksdb::ConcurrentArena::AllocateAligned(unsigned long, unsigned long, rocksdb::Logger*)+0x175) [0x5579588a80d5]15: (()+0xe01d03) [0x5579588e1d03]16: (()+0xe024dd) [0x5579588e24dd]17: (rocksdb::MemTable::Add(unsigned long, rocksdb::ValueType, rocksdb::Slice const&, rocksdb::Slice const&, bool, rocksdb::MemTablePostProcessInfo*)+0x109) [0x5579588a3629]18: (rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x39c) [0x5579588db6cc]19: (rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x5b7) [0x5579588d4fa7]20: (rocksdb::WriteBatchInternal::InsertInto(rocksdb::autovector<rocksdb::WriteThread::Writer*, 8ul> const&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool)+0x14b) [0x5579588d8dcb]21: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool)+0x14a7) [0x55795899e2c7]22: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x2a) [0x55795899ed2a]23: (RocksDBStore::submit_transaction(std::shared_ptr<KeyValueDB::TransactionImpl>)+0xaf) [0x55795847751f]24: (BlueStore::_kv_sync_thread()+0x23dc) [0x55795841150c]25: (BlueStore::KVSyncThread::entry()+0xd) [0x55795845453d]26: (()+0x76ba) [0x7fa2faa3b6ba]27: (clone()+0x6d) [0x7fa2f9ab282d]NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.If the StupidAllocator is supported, I will open up a ticket.Eric
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com