> On Sep 15, 2016, at 11:43 PM, Kamble, Nitin A <Nitin.Kamble@xxxxxxxxxxxx> wrote: > >> >> On Sep 15, 2016, at 11:54 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> >> >> The 128MB figure is mostly pulled out of a hat. I suspect it will be >> reasonable, but a proper recommendation is going to depend on how we end >> up tuning rocksdb, and we've put that off until the metadata format is >> finalized and any rocksdb tuning we do will be meaningful. We're pretty >> much at that point now... >> >> Whatever it is, it should be related to the request rate, and perhaps the >> relative speed of the wal device and the db or main device. The size of >> the slower devices shouldn't matter, though. >> >> There are some bluefs perf counters that let you monitor what the wal >> device utilization is. See >> >> b.add_u64(l_bluefs_wal_total_bytes, "wal_total_bytes", >> "Total bytes (wal device)"); >> b.add_u64(l_bluefs_wal_free_bytes, "wal_free_bytes", >> "Free bytes (wal device)"); >> >> which you can monitor via 'ceph daemon osd.N perf dump'. If you >> discover anything interesting, let us know! >> >> Thanks- >> sage > > I could build and deploy the latest master (commit: 9096ad37f2c0798c26d7784fb4e7a781feb72cb8) with partitioned bluestore. I struggled a bit to bring up OSDs as the available documentation for bringing up the partitioned bluestore OSDs is mostly primitive so far. Once ceph-disk gets updated this pain will go away. We will stress the cluster shortly, but so far I am delighted to see that from ground-zero it is able stand up on it’s own feet to HEALTH_OK without any errors. If I see any issues in our tests I will share it here. > > Thanks, > Nitin Out of 30 OSDs, one failed after stress of 1.5hrs. Rest of the 29 OSDs are holding on fine for many hours. If needed I can provide the executable or objdump. ceph version v11.0.0-2309-g9096ad3 (9096ad37f2c0798c26d7784fb4e7a781feb72cb8) 1: (()+0x892dd2) [0x7fb5bf2b8dd2] 2: (()+0xf890) [0x7fb5bb4ae890] 3: (gsignal()+0x37) [0x7fb5ba270187] 4: (abort()+0x118) [0x7fb5ba271538] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x7fb5bf43a2f5] 6: (BlueFS::_allocate(unsigned char, unsigned long, std::vector<bluefs_extent_t, std::allocator<bluefs_extent_t> >*)+0x8ad) [0x7fb5bf2735dd] 7: (BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0xb4f) [0x7fb5bf27aa1f] 8: (BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x29b) [0x7fb5bf27bc9b] 9: (BlueRocksWritableFile::Sync()+0x4e) [0x7fb5bf29125e] 10: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x139) [0x7fb5bf388699] 11: (rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x7fb5bf389238] 12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool)+0x13cf) [0x7fb5bf2e0a2f] 13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x27) [0x7fb5bf2e1637] 14: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x5b) [0x7fb5bf21a14b] 15: (BlueStore::_kv_sync_thread()+0xf5a) [0x7fb5bf1e7ffa] 16: (BlueStore::KVSyncThread::entry()+0xd) [0x7fb5bf1f5a6d] 17: (()+0x80a4) [0x7fb5bb4a70a4] 18: (clone()+0x6d) [0x7fb5ba32004d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Thanks, Nitin ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f