Re: RBD with PWL cache shows poor performance compared to cache device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 7/3/23 04:53, Matthew Booth wrote:
On Thu, 29 Jun 2023 at 14:11, Mark Nelson <mark.nelson@xxxxxxxxx> wrote:
This container runs:
     fio --rw=write --ioengine=sync --fdatasync=1
--directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf
--output-format=json --runtime=60 --time_based=1

And extracts sync.lat_ns.percentile["99.000000"]
Matthew, do you have the rest of the fio output captured?  It would be interesting to see if it's just the 99th percentile that is bad or the PWL cache is worse in general.
Sure.

With PWL cache: https://paste.openstack.org/show/820504/
Without PWL cache: https://paste.openstack.org/show/b35e71zAwtYR2hjmSRtR/
With PWL cache, 'rbd_cache'=false:
https://paste.openstack.org/show/byp8ZITPzb3r9bb06cPf/

Also, how's the CPU usage client side?  I would be very curious to see
if unwindpmp shows anything useful (especially lock contention):


https://github.com/markhpc/uwpmp


Just attach it to the client-side process and start out with something
like 100 samples (more are better but take longer).  You can run it like:


./unwindpmp -n 100 -p <pid>
I've included the output in this gist:
https://gist.github.com/mdbooth/2d68b7e081a37e27b78fe396d771427d

That gist contains 4 runs: 2 with PWL enabled and 2 without, and also
a markdown file explaining the collection method.

Matt


Thanks Matt!  I looked through the output.  Looks like the symbols might have gotten mangled.  I'm not an expert on the RBD client, but I don't think we would really be calling into rbd_group_snap_rollback_with_progress from librbd::cache::pwl::ssd::WriteLogEntry::writeback_bl.  Was it possible you used the libdw backend for unwindpmp?  libdw sometimes gives strange/mangled callgraphs, but I haven't seen it before with libunwind.  Hopefully Congmin Yin or Ilya can confirm if it's garbage.

So with that said, assuming we can trust these callgraphs at all, it looks like it might be worth looking at the latency of the AbstractWriteLog, librbd::cache::pwl::ssd::WriteLogEntry::writeback_bl, and possibly usage of librados::v14_2_0::IoCtx::object_list.  On the QEMU side, possibly the latency of rbd_aio_flush in both cases.  Also it's possible we might have md_config_t get_val/set_val in the hot path somewhere though it looks minor.  If the rbd_group_snap_rollback_with_progress usage is real, it's significantly more prevalent in the PWL callgraphs.  Without knowing more about how the PWL cache works, I'm not sure if any of this is meaningful or not though.

Mark


Best Regards,
Mark Nelson
Head of R&D (USA)

Clyso GmbH
p: +49 89 21552391 12
a: Loristraße 8 | 80335 München | Germany
w: https://clyso.com | e: mark.nelson@xxxxxxxxx

We are hiring: https://www.clyso.com/jobs/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux