Hi Somnath, Thanks, Very good point. However, one of cloud applications in our data center will have multiple threads(Java code) to flush their data to one image at same time. There are two lock involved in the process , one is image exclusive_lock , another one is object_map lock. I am afraid there is a lot performance hurt by these two locks with multiple threads application. Regards, James 本邮件及其附件含有阿里巴巴集团的商业秘密信息,仅限于发送给上面地址中列出的个人和群组,禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制和散发)本邮件及其附件中的信息,如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 This email and its attachments contain confidential information from Alibaba Group.which is intended only for the person or entity whose address is listed above.Any use of information contained herein in any way(including,but not limited to,total or partial disclosure,reproduction or dissemination)by persons other than the intended recipient(s) is prohibited.If you receive this email in error,please notify the sender by phone or email immediately and delete it. On 2/13/17, 3:00 PM, "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx> wrote: James, It was discussed earlier in the ceph-devel that with exclusive lock enabled , multi thread (job) performance will be hurting. You should increase QD to increase parallelism but not the threads. I brought that up sometimes back but couldn't provide an use case where multiple client will be accessing an image in parallel , it will be great if you have one. Exporting rbd for the data base use case could be the one (?). Thanks & Regards Somnath -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of LIU, Fei Sent: Monday, February 13, 2017 2:37 PM To: dillaman@xxxxxxxxxx Cc: Ceph Development Subject: Re: rbd performance drop a lot with objectmap Hi Jason, It makes sense .By the way, we do a random write with qd=1 initially .The performance drops almost 3 times. We also tested 10 times fio write test in one image using below fio configuration with each fio write taking 30 seconds. [global] ioengine=rbd clientname=admin pool=test-pool rbdname= image-with-objmap rw=write bs=16k direct=1 runtime=1200 ramp_time=30 group_reporting time_based [with_objectmap_rbd_iodepth1_numjobs1] iodepth=128 numjobs=1 IOPS increases from 180 to 1200 in the end after 10 times fio rbd write. Once objectmap is warming up , it lowers the latency a lot and increase IOPS accordingly. However, we also tested the rand write with rbd-bench ,we found the performance almost drop 10 times with more contention involved: 1. rbd create test-pool/no-map-image --size 100G --object-size 16K --image-format 2 --image-feature layering 2. rbd create test-pool/with-map-image --size 100G --object-size 16K --image-format 2 --image-feature layering --image-feature exclusive-lock --image-feature object-map Using rbd bench to write 1G data randomly: rbd bench-write -p test-pool --image no-map-image --io-size 16K --io-threads 16 --io-total 1G --io-pattern rand We found the performance drop almost 10 times in terms of IOPS. It shows the performance is getting worse with more jobs. It make us wondering whether the lock is killing factor. Any thoughts? Regards, James 本邮件及其附件含有阿里巴巴集团的商业秘密信息,仅限于发送给上面地址中列出的个人和群组,禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制和散发)本邮件及其附件中的信息,如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。 This email and its attachments contain confidential information from Alibaba Group.which is intended only for the person or entity whose address is listed above.Any use of information contained herein in any way(including,but not limited to,total or partial disclosure,reproduction or dissemination)by persons other than the intended recipient(s) is prohibited.If you receive this email in error,please notify the sender by phone or email immediately and delete it. On 2/12/17, 9:59 AM, "Jason Dillaman" <jdillama@xxxxxxxxxx> wrote: On Fri, Feb 10, 2017 at 9:50 AM, LIU, Fei <james.liu@xxxxxxxxxxxxxxx> wrote: > With FIO single job queue depth 1(W/ vs W/O) , IOPS drop 3 times and > latency increased 3 times. As with more jobs, the IOPS drops more and > latency increase higher and higher. Assuming you have a random write workload, within a QD=1, it is entirely expected for the first write to the object to incur a performance penalty since it requires an additional round-trip operation to the backing OSDs. Since you only hit this penalty for the very first write to the object, its cost is amortized over future writes. This is similar to cloned images and the amortized cost of copying up the backing parent object to the clone on the first write. > The objectmap_locker in pre and post and objectmap update per IO really hurt > performance. Lockless queue and new way to caching objectmap? Any thoughts? By definition, with a QD=1, there is zero contention on that lock. The lock is really only held for a minuscule amount of time and is dropped while the OSD operation is in-progress. Do you actually have any performance metrics to back up this claim? Note that the post state is only hit when you issue a remove / trim / discard operation. -- Jason -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html