profiling osd threads - durability and scalability questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ceph Devs 

We run fio benchmark against a 3-node ceph cluster. Objects size is 4kb. We use gdbpmp profiler https://github.com/markhpc/gdbpmp to analyze threads' performance. 

Following the profiling report, I have 2 questions: 

  1. in our setting, the bstore_kv_sync thread runs asynchronous rocksdb transactions 98% of the time (in the other 2% it runs synchronous transactions). How does this align with Ceph durability guarantees? What happens if the OSD fails after returning a success indication and before the wal memory buffer is flushed to disk? Are you assuming the wal memory buffer is flushed while the value is written to the memtable? While this is reasonable it cannot guarantee 100% durability. Am I missing something in the write path? 

  2. Each osd has a single bstore_kv_sync thread and 16 tp_osd_tp threads. bstore_kv_sync thread is always busy, while tp_osd_tp threads are not busy most of the time. Given that 3 of rocksdb CFs are sharded, and sharding is configurable, why not run multiple (3) bstore_kv_sync threads? assuming they will not have conflicts most of the time. This has the potential of removing the rocksdb bottleneck and increasing IOPS 

Thank you,
Eshcar
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux