After moving back to tcmalloc my random crash issues have been resolved. I would advise disabling support for jemalloc on bluestore since its not stable or safe... seems risky to allow this?
_____________________________________________
Tyler Bishop
EST 2007
O: 513-299-7108 x1000
This email is intended only for the recipient(s) above and/or otherwise authorized personnel. The information contained herein and attached is confidential and the property of Beyond Hosting. Any unauthorized copying, forwarding, printing, and/or disclosing any information related to this email is prohibited. If you received this message in error, please contact the sender and destroy all copies of this email and any attachment(s).
On Mon, Aug 27, 2018 at 11:15 PM Tyler Bishop <tyler.bishop@xxxxxxxxxxxxxxxxx> wrote:
I bumped another post from earlier in the year. I got this reply:
Adam Tygart <mozes@xxxxxxx>
11:06 PM (8 minutes ago)
This issue was related to using Jemalloc. Jemalloc is not as well
tested with Bluestore and lead to lots of segfaults. We moved back to
the default of tcmalloc with Bluestore and these stopped.
Check /etc/sysconfig/ceph under RHEL based distros.
---I had enabled jemalloc in the sysconfig previously. Disabled that and now appear to have stable OSDs.On Mon, Aug 27, 2018 at 11:13 PM Alfredo Daniel Rezinovsky <alfredo.rezinovsky@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:Have you created the blockdb partitions or LVM manually ?
What size?
On 27/08/18 23:48, Tyler Bishop wrote:
My host has 256GB of ram. 62GB used under most heavy io workload.
_____________________________________________
Tyler BishopEST 2007
O: 513-299-7108 x1000
This email is intended only for the recipient(s) above and/or otherwise authorized personnel. The information contained herein and attached is confidential and the property of Beyond Hosting. Any unauthorized copying, forwarding, printing, and/or disclosing any information related to this email is prohibited. If you received this message in error, please contact the sender and destroy all copies of this email and any attachment(s).
On Mon, Aug 27, 2018 at 10:36 PM Alfredo Daniel Rezinovsky <alfredo.rezinovsky@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
I had blockdb in ssd, with 3 OSDs per host (8G ram) and the default 3G bluestore_cache_size_ssd
I stopped having inconsistencies dropping the cache to 1G.
On 27/08/18 23:32, Tyler Bishop wrote:
Having a constant segfault issue under io load with my newly created bluestore deployment.
Setup is 28GB SSD LVM for block.db and 6T spinner for data.
Config:[global]fsid = REDACTEDmon_initial_members = cephmon-1001, cephmon-1002, cephmon-1003mon_host = 10.20.142.5,10.20.142.6,10.20.142.7auth_cluster_required = cephxauth_service_required = cephxauth_client_required = cephxfilestore_xattr_use_omap = true
# Fixes issue where image is created with newer than supported features enabled.rbd_default_features = 3
# Debug Tuningdebug_lockdep = 0/0debug_context = 0/0debug_crush = 0/0debug_buffer = 0/0debug_timer = 0/0debug_filer = 0/0debug_objecter = 0/0debug_rados = 0/0debug_rbd = 0/0debug_journaler = 0/0debug_objectcatcher = 0/0debug_client = 0/0debug_osd = 0/0debug_optracker = 0/0debug_objclass = 0/0debug_filestore = 0/0debug_journal = 0/0debug_ms = 0/0debug_monc = 0/0debug_tp = 0/0debug_auth = 0/0debug_finisher = 0/0debug_heartbeatmap = 0/0debug_perfcounter = 0/0debug_asok = 0/0debug_throttle = 0/0debug_mon = 0/0debug_paxos = 0/0debug_rgw = 0/0
[osd]osd_mkfs_type = xfsosd_mount_options_xfs = rw,noatime,,nodiratime,inode64,logbsize=256k,delaylogosd_mkfs_options_xfs = -f -i size=2048osd_journal_size = 10240filestore_queue_max_ops=1000filestore_queue_max_bytes = 1048576000filestore_max_sync_interval = 10filestore_merge_threshold = 500filestore_split_multiple = 100osd_op_shard_threads = 6journal_max_write_entries = 5000journal_max_write_bytes = 1048576000journal_queueu_max_ops = 3000journal_queue_max_bytes = 1048576000ms_dispatch_throttle_bytes = 1048576000objecter_inflight_op_bytes = 1048576000public network = 10.20.142.0/24cluster_network = 10.20.136.0/24osd_disk_thread_ioprio_priority = 7osd_disk_thread_ioprio_class = idleosd_max_backfills = 2osd_recovery_sleep = 0.10
[client]rbd_cache = Falserbd cache size = 33554432rbd cache target dirty = 16777216rbd cache max dirty = 25165824rbd cache max dirty age = 2rbd cache writethrough until flush = false
--------
2018-08-28 02:31:30.961954 7f64a895a700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/flush_job.cc:319] [default] [JOB 19] Level-0 flush table #688: 6121532 bytes OK2018-08-28 02:31:30.962476 7f64a895a700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_files.cc:242] adding log 681 to recycle list
2018-08-28 02:31:30.962495 7f64a895a700 4 rocksdb: (Original Log Time 2018/08/28-02:31:30.961973) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/memtable_list.cc:360] [default] Level-0 commit table #688 started2018-08-28 02:31:30.962501 7f64a895a700 4 rocksdb: (Original Log Time 2018/08/28-02:31:30.962413) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/memtable_list.cc:383] [default] Level-0 commit table #688: memtable #1 done2018-08-28 02:31:30.962505 7f64a895a700 4 rocksdb: (Original Log Time 2018/08/28-02:31:30.962432) EVENT_LOG_v1 {"time_micros": 1535423490962423, "job": 19, "event": "flush_finished", "lsm_state": [1, 4, 1, 0, 0, 0, 0], "immutable_memtables": 0}2018-08-28 02:31:30.962509 7f64a895a700 4 rocksdb: (Original Log Time 2018/08/28-02:31:30.962458) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:132] [default] Level summary: base level 1 max bytes base 268435456 files[1 4 1 0 0 0 0] max score 0.84
2018-08-28 02:31:30.962517 7f64a895a700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_files.cc:388] [JOB 19] Try to delete WAL files size 258068015, prev total WAL file size 260608480, number of live WAL files 2.
2018-08-28 02:32:06.102335 7f64b917b700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_write.cc:684] reusing log 681 from recycle list
2018-08-28 02:32:06.102473 7f64b917b700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_write.cc:725] [default] New memtable created with log file: #689. Immutable memtables: 0.
2018-08-28 02:32:06.102542 7f64a895a700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:49] [JOB 20] Syncing log #6872018-08-28 02:32:06.103394 7f64a895a700 4 rocksdb: (Original Log Time 2018/08/28-02:32:06.102527) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:1158] Calling FlushMemTableToOutputFile with column family [default], flush slots available 1, compaction slots allowed 1, compaction slots scheduled 12018-08-28 02:32:06.103407 7f64a895a700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/flush_job.cc:264] [default] [JOB 20] Flushing memtable with next log file: 689
2018-08-28 02:32:06.103435 7f64a895a700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1535423526103422, "job": 20, "event": "flush_started", "num_memtables": 1, "num_entries": 97689, "num_deletes": 21335, "memory_usage": 260069984}2018-08-28 02:32:06.103446 7f64a895a700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/flush_job.cc:293] [default] [JOB 20] Level-0 flush table #690: started2018-08-28 02:32:06.155755 7f64a895a700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1535423526155726, "cf_name": "default", "job": 20, "event": "table_file_creation", "file_number": 690, "file_size": 6343137, "table_properties": {"data_size": 6153638, "index_size": 65232, "filter_size": 123278, "raw_key_size": 2289031, "raw_average_key_size": 52, "raw_value_size": 5374531, "raw_average_value_size": 122, "num_data_blocks": 1047, "num_entries": 43785, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "21429", "kMergeOperands": "220"}}2018-08-28 02:32:06.155776 7f64a895a700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/flush_job.cc:319] [default] [JOB 20] Level-0 flush table #690: 6343137 bytes OK2018-08-28 02:32:06.156214 7f64a895a700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_files.cc:242] adding log 687 to recycle list
2018-08-28 02:32:06.156225 7f64a895a700 4 rocksdb: (Original Log Time 2018/08/28-02:32:06.155790) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/memtable_list.cc:360] [default] Level-0 commit table #690 started2018-08-28 02:32:06.156229 7f64a895a700 4 rocksdb: (Original Log Time 2018/08/28-02:32:06.156164) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/memtable_list.cc:383] [default] Level-0 commit table #690: memtable #1 done2018-08-28 02:32:06.156239 7f64a895a700 4 rocksdb: (Original Log Time 2018/08/28-02:32:06.156178) EVENT_LOG_v1 {"time_micros": 1535423526156172, "job": 20, "event": "flush_finished", "lsm_state": [2, 4, 1, 0, 0, 0, 0], "immutable_memtables": 0}2018-08-28 02:32:06.156244 7f64a895a700 4 rocksdb: (Original Log Time 2018/08/28-02:32:06.156199) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:132] [default] Level summary: base level 1 max bytes base 268435456 files[2 4 1 0 0 0 0] max score 0.84
2018-08-28 02:32:06.156252 7f64a895a700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl_files.cc:388] [JOB 20] Try to delete WAL files size 257866117, prev total WAL file size 259275521, number of live WAL files 2.
_____________________________________________
Tyler BishopEST 2007
O: 513-299-7108 x1000
This email is intended only for the recipient(s) above and/or otherwise authorized personnel. The information contained herein and attached is confidential and the property of Beyond Hosting. Any unauthorized copying, forwarding, printing, and/or disclosing any information related to this email is prohibited. If you received this message in error, please contact the sender and destroy all copies of this email and any attachment(s).
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- Alfredo Daniel Rezinovsky Director de Tecnologías de Información y Comunicaciones Facultad de Ingeniería - Universidad Nacional de Cuyo
-- Alfredo Daniel Rezinovsky Director de Tecnologías de Información y Comunicaciones Facultad de Ingeniería - Universidad Nacional de Cuyo
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com