KVM/QEMU rbd read latency

lacroute@xxxxxxxxxxxxxxxxxx (Phil Lacroute) · Fri, 17 Feb 2017 12:35:44 -0800

Thanks everyone for the suggestions.  Disabling the RBD cache, disabling the debug logging and building qemu with jemalloc each had a significant impact.  Performance is up from 25K IOPS to 63K IOPS.  Hopefully the ongoing work to reduce the number of buffer copies will yield further improvements.

I have a followup question about the debug logging.  Is there any way to dump the in-memory logs from the QEMU RBD client?  If not (and I couldn?t find a way to do this), then nothing is lost by disabling the logging on client machines.

Thanks,
Phil

> On Feb 16, 2017, at 1:20 PM, Jason Dillaman <jdillama at redhat.com> wrote:
> 
> Few additional suggestions:
> 
> 1) For high IOPS random read workloads, the librbd cache is most likely going to be a bottleneck and is providing zero benefit. Recommend setting "cache=none" on your librbd QEMU disk to disable it.
> 
> 2) Disable logging via your ceph.conf. Example settings:
> 
> debug_auth = 0/0
> debug_buffer = 0/0
> debug_context = 0/0
> debug_crypto = 0/0
> debug_finisher = 0/0
> debug_ms = 0/0
> debug_objectcacher = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_striper = 0/0
> debug_tp = 0/0
> 
> The above two config changes on my small development cluster take my librbd 4K random reads IOPS from ~9.5K to ~12.5K IOPS (+32%)
> 
> 3) librbd / librados is very heavy with small memory allocations on the IO path and previous reports have indicated that using jemalloc w/ QEMU shows large improvements.
> 
> LD_PRELOADing jemalloc within fio using the optimized config takes me from ~12.5K IOPS to ~13.5K IOPS (+8%).
> 
> 
> On Thu, Feb 16, 2017 at 3:38 PM, Steve Taylor <steve.taylor at storagecraft.com <mailto:steve.taylor at storagecraft.com>> wrote:
> 
> You might try running fio directly on the host using the rbd ioengine (direct librbd) and see how that compares. The major difference between that and the krbd test will be the page cache readahead, which will be present in the krbd stack but not with the rbd ioengine. I would have expected the guest OS to normalize that some due to its own page cache in the librbd test, but that might at least give you some more clues about where to look further.
> 
> 
> 
> <imagea0af4f.JPG> <https://storagecraft.com/>	Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation <https://storagecraft.com/>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 <tel:(801)%20871-2799> |
> 
> 
> 
> If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.
> 
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces at lists.ceph.com <mailto:ceph-users-bounces at lists.ceph.com>] On Behalf Of Phil Lacroute
> Sent: Thursday, February 16, 2017 11:54 AM
> To: ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
> Subject: [ceph-users] KVM/QEMU rbd read latency
> 
> Hi,
> 
> I am doing some performance characterization experiments for ceph with KVM guests, and I?m observing significantly higher read latency when using the QEMU rbd client compared to krbd.  Is that expected or have I missed some tuning knobs to improve this?
> 
> Cluster details:
> Note that this cluster was built for evaluation purposes, not production, hence the choice of small SSDs with low endurance specs.
> Client host OS: Debian, 4.7.0 kernel
> QEMU version 2.7.0
> Ceph version Jewel 10.2.3
> Client and OSD CPU: Xeon D-1541 2.1 GHz
> OSDs: 5 nodes, 3 SSDs each, one journal partition and one data partition per SSD, XFS data file system (15 OSDs total)
> Disks: DC S3510 240GB
> Network: 10 GbE, dedicated switch for storage traffic Guest OS: Debian, virtio drivers
> 
> Performance testing was done with fio on raw disk devices using this config:
> ioengine=libaio
> iodepth=128
> direct=1
> size=100%
> rw=randread
> bs=4k
> 
> Case 1: krbd, fio running on the raw rbd device on the client host (no guest)
> IOPS: 142k
> Average latency: 0.9 msec
> 
> Case 2: krbd, fio running in a guest (libvirt config below)
>    <disk type='file' device='disk'>
>      <driver name='qemu' type='raw' cache='none'/>
>      <source file='/dev/rbd0'/>
>      <backingStore/>
>      <target dev='vdb' bus='virtio'/>
>    </disk>
> IOPS: 119k
> Average Latency: 1.1 msec
> 
> Case 3: QEMU RBD client, fio running in a guest (libvirt config below)
>    <disk type='network' device='disk'>
>      <driver name='qemu'/>
>      <auth username='app1'>
>        <secret type='ceph' usage='app_pool'/>
>      </auth>
>      <source protocol='rbd' name='app/image1'/>
>      <target dev='vdc' bus='virtio'/>
>    </disk>
> IOPS: 25k
> Average Latency: 5.2 msec
> 
> The question is why the test with the QEMU RBD client (case 3) shows 4 msec of additional latency compared the guest using the krbd-mapped image (case 2).
> 
> Note that the IOPS bottleneck for all of these cases is the rate at which the client issues requests, which is limited by the average latency and the maximum number of outstanding requests (128).  Since the latency is the dominant factor in average read throughput for these small accesses, we would really like to understand the source of the additional latency.
> 
> Thanks,
> Phil
> 
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com <mailto:ceph-users at lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> 
> 
> 
> -- 
> Jason

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20170217/f5e2ea11/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3589 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20170217/f5e2ea11/attachment.bin>