Re: rbd performance drop a lot with objectmap

"LIU, Fei" <james.liu@xxxxxxxxxxxxxxx> · Tue, 14 Feb 2017 07:27:22 +0800

Hi Somnath,
   Thanks,  Very good point. However, one of cloud applications  in our data center will have multiple threads(Java code) to flush their data to one image at same time. There are two lock involved in the process , one is image exclusive_lock , another one is object_map lock.    I am afraid there is a lot performance hurt by these two locks with multiple threads application. 

  Regards,
  James

本邮件及其附件含有阿里巴巴集团的商业秘密信息，仅限于发送给上面地址中列出的个人和群组，禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制和散发）本邮件及其附件中的信息，如果您错收本邮件，请您立即电话或邮件通知发件人并删除本邮件。
This email and its attachments contain confidential information from Alibaba Group.which is intended only for the person or entity whose address is listed above.Any use of information contained herein in any way(including,but not limited to,total or partial disclosure,reproduction or dissemination)by persons other than the intended recipient(s) is prohibited.If you receive this email in error,please notify the sender by phone or email immediately and delete it.

On 2/13/17, 3:00 PM, "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx> wrote:

    James,
    It was discussed earlier in the ceph-devel that with exclusive lock enabled , multi thread (job) performance will be hurting. You should increase QD to increase parallelism but not the threads.
    I brought that up sometimes back but couldn't provide an use case where multiple client will be accessing an image in parallel , it will be great if you have one. Exporting rbd for the data base use case could be the one (?).

    Thanks & Regards
    Somnath

    -----Original Message-----
    From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of LIU, Fei
    Sent: Monday, February 13, 2017 2:37 PM
    To: dillaman@xxxxxxxxxx
    Cc: Ceph Development
    Subject: Re: rbd performance drop a lot with objectmap

    Hi Jason,
       It makes sense .By the way, we do a random write with qd=1 initially .The performance drops almost 3 times.
      We also tested 10  times fio write test in one image using below fio configuration with each fio write taking 30 seconds.  
    [global]
    ioengine=rbd
    clientname=admin
    pool=test-pool
    rbdname= image-with-objmap
    rw=write
    bs=16k
    direct=1
    runtime=1200
    ramp_time=30
    group_reporting
    time_based
    [with_objectmap_rbd_iodepth1_numjobs1]
    iodepth=128
    numjobs=1

      IOPS increases from 180 to 1200 in the end after 10 times fio rbd write. Once objectmap is warming up , it lowers the latency a lot  and increase IOPS accordingly. 

    However, we also tested the rand write with rbd-bench ,we found the performance almost drop 10 times with more contention involved:

    1.	rbd create test-pool/no-map-image --size 100G --object-size 16K --image-format 2 --image-feature layering  
    2.	rbd create test-pool/with-map-image --size 100G --object-size 16K --image-format 2 --image-feature layering --image-feature exclusive-lock --image-feature object-map

            Using rbd bench to write 1G data randomly：
             rbd bench-write -p test-pool --image no-map-image --io-size 16K --io-threads 16 --io-total 1G --io-pattern rand

     We found the performance drop almost 10 times in terms of IOPS. It shows the performance is getting worse with more jobs. It make us wondering whether the lock is killing factor. Any thoughts?

       Regards,
       James

    本邮件及其附件含有阿里巴巴集团的商业秘密信息，仅限于发送给上面地址中列出的个人和群组，禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制和散发）本邮件及其附件中的信息，如果您错收本邮件，请您立即电话或邮件通知发件人并删除本邮件。
    This email and its attachments contain confidential information from Alibaba Group.which is intended only for the person or entity whose address is listed above.Any use of information contained herein in any way(including,but not limited to,total or partial disclosure,reproduction or dissemination)by persons other than the intended recipient(s) is prohibited.If you receive this email in error,please notify the sender by phone or email immediately and delete it.

    On 2/12/17, 9:59 AM, "Jason Dillaman" <jdillama@xxxxxxxxxx> wrote:

        On Fri, Feb 10, 2017 at 9:50 AM, LIU, Fei <james.liu@xxxxxxxxxxxxxxx> wrote:
        > With FIO single job queue depth 1(W/ vs W/O)  , IOPS drop 3 times and
        > latency increased 3 times. As with more jobs, the IOPS drops more and
        > latency increase higher and higher.

        Assuming you have a random write workload, within a QD=1, it is
        entirely expected for the first write to the object to incur a
        performance penalty since it requires an additional round-trip
        operation to the backing OSDs. Since you only hit this penalty for the
        very first write to the object, its cost is amortized over future
        writes. This is similar to cloned images and the amortized cost of
        copying up the backing parent object to the clone on the first write.

        > The objectmap_locker in pre and post and objectmap update per IO really hurt
        > performance. Lockless queue and new way to caching objectmap? Any thoughts?

        By definition, with a QD=1, there is zero contention on that lock. The
        lock is really only held for a minuscule amount of time and is dropped
        while the OSD operation is in-progress. Do you actually have any
        performance metrics to back up this claim? Note that the post state is
        only hit when you issue a remove / trim / discard operation.

        -- 
        Jason

    --
    To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

    This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html