Re: Bluestore OSD support in ceph-disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Sep 15, 2016, at 11:43 PM, Kamble, Nitin A <Nitin.Kamble@xxxxxxxxxxxx> wrote:
> 
>> 
>> On Sep 15, 2016, at 11:54 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> 
>> 
>> The 128MB figure is mostly pulled out of a hat.  I suspect it will be 
>> reasonable, but a proper recommendation is going to depend on how we end 
>> up tuning rocksdb, and we've put that off until the metadata format is 
>> finalized and any rocksdb tuning we do will be meaningful.  We're pretty 
>> much at that point now...
>> 
>> Whatever it is, it should be related to the request rate, and perhaps the 
>> relative speed of the wal device and the db or main device.  The size of 
>> the slower devices shouldn't matter, though.
>> 
>> There are some bluefs perf counters that let you monitor what the wal 
>> device utilization is.  See 
>> 
>> b.add_u64(l_bluefs_wal_total_bytes, "wal_total_bytes",
>> 	    "Total bytes (wal device)");
>> b.add_u64(l_bluefs_wal_free_bytes, "wal_free_bytes",
>> 	    "Free bytes (wal device)");
>> 
>> which you can monitor via 'ceph daemon osd.N perf dump'.  If you 
>> discover anything interesting, let us know!
>> 
>> Thanks-
>> sage
> 
> I could build and deploy the latest master (commit: 9096ad37f2c0798c26d7784fb4e7a781feb72cb8) with partitioned bluestore. I struggled a bit to bring up OSDs as the available documentation for bringing up the partitioned bluestore OSDs is mostly primitive so far. Once ceph-disk gets updated this pain will go away. We will stress the cluster shortly, but so far I am delighted to see that from ground-zero it is able stand up on it’s own feet to HEALTH_OK without any errors. If I see any issues in our tests I will share it here.
> 
> Thanks,
> Nitin

Out of 30 OSDs, one failed after stress of 1.5hrs. Rest of the 29 OSDs are holding on fine for many hours.
If needed I can provide the executable or objdump.


 ceph version v11.0.0-2309-g9096ad3 (9096ad37f2c0798c26d7784fb4e7a781feb72cb8)
 1: (()+0x892dd2) [0x7fb5bf2b8dd2]
 2: (()+0xf890) [0x7fb5bb4ae890]
 3: (gsignal()+0x37) [0x7fb5ba270187]
 4: (abort()+0x118) [0x7fb5ba271538]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x7fb5bf43a2f5]
 6: (BlueFS::_allocate(unsigned char, unsigned long, std::vector<bluefs_extent_t, std::allocator<bluefs_extent_t> >*)+0x8ad) [0x7fb5bf2735dd]
 7: (BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0xb4f) [0x7fb5bf27aa1f]
 8: (BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x29b) [0x7fb5bf27bc9b]
 9: (BlueRocksWritableFile::Sync()+0x4e) [0x7fb5bf29125e]
 10: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x139) [0x7fb5bf388699]
 11: (rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x7fb5bf389238]
 12: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool)+0x13cf) [0x7fb5bf2e0a2f]
 13: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x27) [0x7fb5bf2e1637]
 14: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x5b) [0x7fb5bf21a14b]
 15: (BlueStore::_kv_sync_thread()+0xf5a) [0x7fb5bf1e7ffa]
 16: (BlueStore::KVSyncThread::entry()+0xd) [0x7fb5bf1f5a6d]
 17: (()+0x80a4) [0x7fb5bb4a70a4]
 18: (clone()+0x6d) [0x7fb5ba32004d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Thanks,
Nitin


��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux