Hi Somnath: You mentioned: There is still one global lock we have; this is to protect pg_for_processing() and this we can't get rid of since we need to maintain op order within a pg. But for most object operations, we only maintain the order of object. Why need maintain op order within a pg? Can you explain in detail? > -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Somnath Roy > Sent: Sunday, September 28, 2014 5:02 PM > To: Dong Yuan > Cc: ceph-devel > Subject: RE: Latency Improvement Report for ShardedOpWQ > > Dong, > This is mostly because of lock contention may be. > You can tweak the number of shards in case of sharded WQ to see if it is > improving this number or not. > There is still one global lock we have; this is to protect pg_for_processing() and > this we can't get rid of since we need to maintain op order within a pg. This > could be increasing latency as well. I would suggest you to measure this > number in different stages within ShardedOpWQ::_process() like after dequeue > from pqueue and after getting the pglock and popping the ops from > pg_for_processing(). > > Also, keep in mind there is context switch happening and this could be > expensive depending on the data copy etc. It's worth trying this experiment by > pinning OSD to may be actual physical cores ? > > Thanks & Regards > Somnath > > -----Original Message----- > From: Dong Yuan [mailto:yuandong1222@xxxxxxxxx] > Sent: Sunday, September 28, 2014 12:19 AM > To: Somnath Roy > Cc: ceph-devel > Subject: Re: Latency Improvement Report for ShardedOpWQ > > Hi Somnath, > > I totally agree with you. > > I read the code about sharded TP and the new OSD OpWQ. In the new > implementation, there is not single lock for all PGs, but each lock > for a subset of PGs(Am I right?). It is very useful to reduce lock > contention and so increase parallelism. It is an awesome work! > > While I am working on the latency of single IO (mainly 4K random write), I > notice the OpWQ spent about 100+us to transfer an IO from msg dispatcher to > OpWQ worker thread, Do you have any idea to reduce the time span? > > Thanks for your help. > Dong. > > On 28 September 2014 13:46, Somnath Roy <Somnath.Roy@xxxxxxxxxxx> > wrote: > > Hi Dong, > > I don't think in case of single client scenario there is much benefit. Single > client has a limitation. The benefit with sharded TP is, a single OSD is scaling > much more with the increase of clients since it is increasing parallelism (by > reducing lock contention) in the filestore level. A quick check could be like this. > > > > 1. Create a single node, single OSD cluster and try putting load with > increasing number of clients like 1,3, 5, 8,10. Small workload serving from > memory should be ideal. > > 2. Compare the code with sharded TP against say firefly. You should be seeing > firefly is not scaling with increasing number of clients. > > 3. try top -H on two different case and you should be seeing more threads in > case of sharded tp were working in parallel than firefly. > > > > Also, I am sure this latency result will not hold true in high workload , there > you should be seeing more contention and as a result more latency. > > > > Thanks & Regards > > Somnath > > > > -----Original Message----- > > From: ceph-devel-owner@xxxxxxxxxxxxxxx > > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Dong Yuan > > Sent: Saturday, September 27, 2014 8:45 PM > > To: ceph-devel > > Subject: Latency Improvement Report for ShardedOpWQ > > > > ===== Test Purpose ===== > > > > Measure whether and how much Sharded OpWQ is better than Traditional > OpWQ for random write scene. > > > > ===== Test Case ===== > > > > 4K Object WriteFull for 1w times. > > > > ===== Test Method ===== > > > > Put the following static probes into codes when running tests to get the time > span between enqeueue and dequeue of OpWQ. > > > > Start: PG::enqueue_op before osd->op_wq.equeue call > > End: OSD::dequeue_op.entry > > > > ===== Test Result ===== > > > > Traditional OpWQ: 109us(AVG), 40us(MIN) > > ShardedOpWQ: 97us(AVG), 32us(MIN) > > > > ===== Test Conclusion ===== > > > > No Remarkably Improvement for Latency > > > > > > -- > > Dong Yuan > > Email:yuandong1222@xxxxxxxxx > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > > info at http://vger.kernel.org/majordomo-info.html > > > > ________________________________ > > > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby notified > that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly prohibited. If > you have received this communication in error, please notify the sender by > telephone or e-mail (as shown above) immediately and destroy any and all > copies of this message in your possession (whether hard copies or > electronically stored copies). > > > > > > -- > Dong Yuan > Email:yuandong1222@xxxxxxxxx > 칻 & ~ & +- ݶ w ˛ m ^ b ^n r z h & G > h ( 階 ݢj" m z ޖ f h ~ m ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f