Re: RBD with PWL cache shows poor performance compared to cache device

Matthew Booth <mbooth@xxxxxxxxxx> · Mon, 3 Jul 2023 10:53:27 +0100

On Thu, 29 Jun 2023 at 14:11, Mark Nelson <mark.nelson@xxxxxxxxx> wrote:
> >>> This container runs:
> >>>     fio --rw=write --ioengine=sync --fdatasync=1
> >>> --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf
> >>> --output-format=json --runtime=60 --time_based=1
> >>>
> >>> And extracts sync.lat_ns.percentile["99.000000"]
> >>
> >> Matthew, do you have the rest of the fio output captured?  It would be interesting to see if it's just the 99th percentile that is bad or the PWL cache is worse in general.
> > Sure.
> >
> > With PWL cache: https://paste.openstack.org/show/820504/
> > Without PWL cache: https://paste.openstack.org/show/b35e71zAwtYR2hjmSRtR/
> > With PWL cache, 'rbd_cache'=false:
> > https://paste.openstack.org/show/byp8ZITPzb3r9bb06cPf/
>
>
> Also, how's the CPU usage client side?  I would be very curious to see
> if unwindpmp shows anything useful (especially lock contention):
>
>
> https://github.com/markhpc/uwpmp
>
>
> Just attach it to the client-side process and start out with something
> like 100 samples (more are better but take longer).  You can run it like:
>
>
> ./unwindpmp -n 100 -p <pid>

I've included the output in this gist:
https://gist.github.com/mdbooth/2d68b7e081a37e27b78fe396d771427d

That gist contains 4 runs: 2 with PWL enabled and 2 without, and also
a markdown file explaining the collection method.

Matt
-- 
Matthew Booth
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx