Re: RBD with PWL cache shows poor performance compared to cache device

"Yin, Congmin" <congmin.yin@xxxxxxxxx> · Tue, 4 Jul 2023 09:45:22 +0000

Hi ,  Matthew

I see "rbd with pwl cache: 5210112 ns",  This latency is beyond my expectations and I believe it is unlikely to occur. In theory, this value should be around a few hundred microseconds. But I'm not sure what went wrong in your steps. Can you use perf for latency analysis. Hi  @Ilya Dryomov , do you have any suggestions?

Perf, some command:
admin_socket = /mnt/pmem/cache.asok
ceph --admin-daemon /mnt/pmem/cache.asok perf reset all
ceph --admin-daemon /mnt/pmem/cache.asok perf dump

-----Original Message-----
From: Matthew Booth <mbooth@xxxxxxxxxx> 
Sent: Monday, July 3, 2023 6:09 PM
To: Yin, Congmin <congmin.yin@xxxxxxxxx>
Cc: Ilya Dryomov <idryomov@xxxxxxxxxx>; Giulio Fidente <gfidente@xxxxxxxxxx>; Tang, Guifeng <guifeng.tang@xxxxxxxxx>; Vikhyat Umrao <vumrao@xxxxxxxxxx>; Jdurgin <Jdurgin@xxxxxxxxxx>; John Fulton <johfulto@xxxxxxxxxx>; Francesco Pantano <fpantano@xxxxxxxxxx>; ceph-users@xxxxxxx
Subject: Re:  RBD with PWL cache shows poor performance compared to cache device

On Fri, 30 Jun 2023 at 08:50, Yin, Congmin <congmin.yin@xxxxxxxxx> wrote:
>
> Hi Matthew,
>
> Due to the latency of rbd layers, the write latency of the pwl cache is more than ten times that of the Raw device.
> I replied directly below the 2 questions.
>
> Best regards.
> Congmin Yin
>
>
> -----Original Message-----
> From: Matthew Booth <mbooth@xxxxxxxxxx>
> Sent: Thursday, June 29, 2023 7:23 PM
> To: Ilya Dryomov <idryomov@xxxxxxxxxx>
> Cc: Giulio Fidente <gfidente@xxxxxxxxxx>; Yin, Congmin 
> <congmin.yin@xxxxxxxxx>; Tang, Guifeng <guifeng.tang@xxxxxxxxx>; 
> Vikhyat Umrao <vumrao@xxxxxxxxxx>; Jdurgin <Jdurgin@xxxxxxxxxx>; John 
> Fulton <johfulto@xxxxxxxxxx>; Francesco Pantano <fpantano@xxxxxxxxxx>; 
> ceph-users@xxxxxxx
> Subject: Re:  RBD with PWL cache shows poor performance 
> compared to cache device
>
> On Wed, 28 Jun 2023 at 22:44, Ilya Dryomov <idryomov@xxxxxxxxxx> wrote:
> >> ** TL;DR
> >>
> >> In testing, the write latency performance of a PWL-cache backed RBD 
> >> disk was 2 orders of magnitude worse than the disk holding the PWL 
> >> cache.
>
>
>
> PWL cache can use pmem or SSD as cache devices. Using PMEM, based on 
> my test environment at that time, I can give specific data as follows: 
> the write latency of the pmem Raw device is about 10+us, the write 
> latency of the pwl cache is about 100us+(from the latency of the rbd 
> layers), and the write latency of the ceph cluster is about 
> 1000+us(from messengers and network). But for SSDs, there are many 
> types, and I cannot provide a specific value, but it will definitely 
> be worse than pmem. So, for a phenomenon that is 2 orders of magnitude 
> lower, it is worse than expected. Can you provide detailed values of 
> the three for analysis. (SSD, pwl cache, ceph cluster)

I'm not entirely sure what you're asking for. Which values are you looking for?

I did provide 3 sets of test results below, is that what you mean?
* rbd no cache: 1417216 ns
* pwl cache device: 44288 ns
* rbd with pwl cache: 5210112 ns

These are all outputs from the benchmarking test. The first is executing in the VM writing to a ceph RBD disk *without* PWL. The second is executing on the host writing directly to the SSD which is being used for the PWL cache. The third is execuing in the VM writing to the same ceph RBD disk, but this time *with* PWL.

Incidentally, the client and server machines are identical, and the SSD used by the client for PWL is the same model used on the server as the OSDs. The SSDs are SAMSUNG MZ7KH480HAHQ0D3 SSDs attached to PERC H730P Mini (Embedded).

> ==============================================================
>
> >>
> >> ** Summary
> >>
> >> I was hoping that PWL cache might be a good solution to the problem 
> >> of write latency requirements of etcd when running a kubernetes 
> >> control plane on ceph. Etcd is extremely write latency sensitive 
> >> and becomes unstable if write latency is too high. The etcd 
> >> workload can be characterised by very small (~4k) writes with a queue depth of 1.
> >> Throughput, even on a busy system, is normally very low. As etcd is 
> >> distributed and can safely handle the loss of un-flushed data from 
> >> a single node, a local ssd PWL cache for etcd looked like an ideal 
> >> solution.
> >
> >
> > Right, this is exactly the use case that the PWL cache is supposed to address.
>
> Good to know!
>
> >> My expectation was that adding a PWL cache on a local SSD to an 
> >> RBD-backed would improve write latency to something approaching the 
> >> write latency performance of the local SSD. However, in my testing 
> >> adding a PWL cache to an rbd-backed VM increased write latency by 
> >> approximately 4x over not using a PWL cache. This was over 100x 
> >> more than the write latency performance of the underlying SSD.
>
>
>
>
> When using image as the VM's disk, you may have used commands like the following. In many cases, using parameters such as writeback will force the start of rbd cache, which is a memory cache. It is normal for pwl cache to be several times slower than it. Please confirm.
> There is currently no parameter support for using only pwl cache instead of rbd cache. I have tested the latency of using pwl cache (pmem) by modifying the code myself, which is about twice as high as using rbd cache.
>
> qemu -m 1024 -drive 
> format=raw,file=rbd:data/squeeze:rbd_cache=true,cache=writeback

I created the rbd disk by first installing the VM on a local qcow2 file, then copying the data from the qcow2 to rbd, converting to raw.
The command I used was:

`qemu-img convert -f qcow2 -O raw
/var/lib/libvirt/images/pwl-test.qcow2
rbd:libvirt-pool/pwl-test:id=libvirt`

I am configuring rbd options from the server by setting options on the pool. I have been confirming that options are being set correctly with `rbd status libvirt-pool/pwl-test` on the server.

The latest set of profiling data requested by Mark were generated entirely with `rbd_cache=false`:
https://gist.github.com/mdbooth/2d68b7e081a37e27b78fe396d771427d
--
Matthew Booth

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx