On 7/3/23 04:53, Matthew Booth wrote:
On Thu, 29 Jun 2023 at 14:11, Mark Nelson <mark.nelson@xxxxxxxxx> wrote:
This container runs:
fio --rw=write --ioengine=sync --fdatasync=1
--directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf
--output-format=json --runtime=60 --time_based=1
And extracts sync.lat_ns.percentile["99.000000"]
Matthew, do you have the rest of the fio output captured? It would be interesting to see if it's just the 99th percentile that is bad or the PWL cache is worse in general.
Sure.
With PWL cache: https://paste.openstack.org/show/820504/
Without PWL cache: https://paste.openstack.org/show/b35e71zAwtYR2hjmSRtR/
With PWL cache, 'rbd_cache'=false:
https://paste.openstack.org/show/byp8ZITPzb3r9bb06cPf/
Also, how's the CPU usage client side? I would be very curious to see
if unwindpmp shows anything useful (especially lock contention):
https://github.com/markhpc/uwpmp
Just attach it to the client-side process and start out with something
like 100 samples (more are better but take longer). You can run it like:
./unwindpmp -n 100 -p <pid>
I've included the output in this gist:
https://gist.github.com/mdbooth/2d68b7e081a37e27b78fe396d771427d
That gist contains 4 runs: 2 with PWL enabled and 2 without, and also
a markdown file explaining the collection method.
Matt
Thanks Matt! I looked through the output. Looks like the symbols might
have gotten mangled. I'm not an expert on the RBD client, but I don't
think we would really be calling into
rbd_group_snap_rollback_with_progress from
librbd::cache::pwl::ssd::WriteLogEntry::writeback_bl. Was it possible
you used the libdw backend for unwindpmp? libdw sometimes gives
strange/mangled callgraphs, but I haven't seen it before with
libunwind. Hopefully Congmin Yin or Ilya can confirm if it's garbage.
So with that said, assuming we can trust these callgraphs at all, it
looks like it might be worth looking at the latency of the
AbstractWriteLog, librbd::cache::pwl::ssd::WriteLogEntry::writeback_bl,
and possibly usage of librados::v14_2_0::IoCtx::object_list. On the
QEMU side, possibly the latency of rbd_aio_flush in both cases. Also
it's possible we might have md_config_t get_val/set_val in the hot path
somewhere though it looks minor. If the
rbd_group_snap_rollback_with_progress usage is real, it's significantly
more prevalent in the PWL callgraphs. Without knowing more about how
the PWL cache works, I'm not sure if any of this is meaningful or not
though.
Mark
Best Regards,
Mark Nelson
Head of R&D (USA)
Clyso GmbH
p: +49 89 21552391 12
a: Loristraße 8 | 80335 München | Germany
w: https://clyso.com | e: mark.nelson@xxxxxxxxx
We are hiring: https://www.clyso.com/jobs/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx