I didn't look into it closely, but that almost certainly means that your queue is reordering primary->replica replicated write messages. -Sam On Wed, Nov 4, 2015 at 8:54 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > I've got some rough code that changes out the token bucket queue in > PrioritizedQueue.h with a weighted round robin queue located at [1]. > Even though there is some more optimizations that can be done, running > the fio job [2], I've seen about a ~20% performance increase on > spindles and ~6% performance increase on SSDs (my hosts are CPU bound > on SSD). > > The idea of this queue is to try to be fair to all OPs relative to > their priority while at the same time reducing the overhead for each > OP (queue and dequeue) from O(n) to closer to O(1). > > One issue that I'm having is that under certain workloads and usually > during recovery I get these asserts and need help pinpointing how to > resolve it. > > osd/PG.cc: In function 'void PG::add_log_entry(const pg_log_entry_t&, > ceph::bufferlist&)' thread 7f55d61fd700 time 2015-11-03 > 14:44:28.638112 > osd/PG.cc: 2923: FAILED assert(e.version > info.last_update) > osd/PG.cc: In function 'void PG::add_log_entry(const pg_log_entry_t&, > ceph::bufferlist&)' thread 7f55d7a00700 time 2015-11-03 > 14:44:28.637053 > osd/PG.cc: 2923: FAILED assert(e.version > info.last_update) > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x76) [0xc1e3a6] > 2: ceph-osd() [0x7d5a7c] > 3: (PG::append_log(std::vector > const&, eversion_t, eversion_t, > ObjectStore::Transaction&, bool)+0x111) [0x7f7181] > 4: (ReplicatedPG::log_operation(std::vector > const&, > boost::optional&, eversion_t const&, eversion_t const&, bool, > ObjectStore::Transaction*)+0xad) [0x8bfc7d] > 5: (void ReplicatedBackend::sub_op_modify_impl(std::tr1::shared_ptr)+0x7b9) > [0xa5e119] > 6: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr)+0x4a) [0xa4950a] > 7: (ReplicatedBackend::handle_message(std::tr1::shared_ptr)+0x363) [0xa49923] > 8: (ReplicatedPG::do_request(std::tr1::shared_ptr&, > ThreadPool::TPHandle&)+0x159) [0x847ae9] > 9: (OSD::dequeue_op(boost::intrusive_ptr, std::tr1::shared_ptr, > ThreadPool::TPHandle&)+0x3cf) [0x690cef] > 10: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x469) [0x691359] > 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x89e) > [0xc0d8ae] > 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xc0fa00] > 13: (()+0x80a4) [0x7f55f9edd0a4] > 14: (clone()+0x6d) [0x7f55f843904d] > NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. > > I think this means that the PG log to be appended is newer than what > is expected, but I'm not sure how to rectify it. Any pushes in the > right direction would be helpful. > > It seems that this queue is helping with recover ops even when > osd_max_backfills=20 with max client ops, but I don't have good long > term data due to this issue. I think this has also impacted my SSD > testing as I lose one OSD during the test, reducing the performance > temporarily. > > When looking through my code, please remember. > 1. This may be the first time I wrote C++ code, or it has been long > enough it seems like it. > 2. There is still some optimizations that I know can be done. But I'm > happy to have people share any optimization opportunities they see. > 3. I'm trying to understand the reason for the assert and pointers how > to resolve it. > 4. It seems like there are multiple threads of the queue keeping the > queues pretty small. How can I limit the queue to one thread so all > OPs have to be queued in one queue? I'd like to see the differences > with changing this. > 5. I'd like any pointers to improving this code. > > Thank you, > Robert LeBlanc > > [1] https://github.com/ceph/ceph/compare/hammer...rldleblanc:wrr-queue > [2] [rbd-test] > #readwrite=write > #blocksize=4M > runtime=600 > name=rbd-test > readwrite=randrw > bssplit=4k/85:32k/11:512/3:1m/1,4k/89:32k/10:512k/1 > rwmixread=72 > norandommap > #size=1T > #blocksize=4k > ioengine=rbd > rbdname=test5 > pool=ssd-pool > clientname=admin > iodepth=8 > numjobs=4 > thread > group_reporting > time_based > #direct=1 > ramp_time=60 > > - ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > -----BEGIN PGP SIGNATURE----- > Version: Mailvelope v1.2.3 > Comment: https://www.mailvelope.com > > wsFcBAEBCAAQBQJWOjheCRDmVDuy+mK58QAAfxkQAJjgP4cjtHiFdtZgR2Zo > yMPeV1b+ZYoQr4XbyCqWAsRdgigdcesCnjxyTOWnK+nHZgxMOgtHn8rylltV > 17NzleGKfQUDRe7jLHLOaLDMphODvW0BjJHV8uk5DzYVJhVOhT5oHtJTtRXY > JtMCIaGcwEPSP9IE+bkzX22fPEeNnkCHFAosmratD2WIeaNrOfV0DNOfAotO > FX2/w0NtiuNqr+KEH3MrPdHkENXLhG2A8wiLqJ7sN0LvclwGbO9eZ01sv5nV > bqqS8dQjd4oh31799vBroX73uMOb+ljeXNguz/4l4Tekn+F3m5puFHEX2o23 > NroU1YHNcKFAOwppZ7pDrAn3ATzvOEsZ7574dJw5vPxquCgsF0T8/phsk71D > E1IOQC/EIqCw4wUnujwlEZXwlSXRLyqT5xUrSXo/qtM4HUz4PmWukxZxOmk/ > Afewcbq/5ElSZQus1xmMdmtGocSGAvMmYthIbXP+3l2127bMK2ptacL6VMSf > uO+wYCLQZDnpjlx9DYt4CAEbEeuS4vCSzIkGishcuFNHGmM/gXXqYFybAATt > IbLRWZBrq4TyfJe9sIp6aNPbi/IHxSV4NVVX3q1P2j91UDKKVL6hu9Ln0HTY > UrFuDnH0yjvwBm4vJ0ksoWLIWTciLTTz68ZyOnOnr+uXGbkQEz1LzMQWZ+Cl > saYj > =R1Lm > -----END PGP SIGNATURE----- > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html