Re: RBD with PWL cache shows poor performance compared to cache device

Matthew Booth <mbooth@xxxxxxxxxx> · Tue, 4 Jul 2023 10:00:04 +0100

On Mon, 3 Jul 2023 at 18:33, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>
> On Mon, Jul 3, 2023 at 6:58 PM Mark Nelson <mark.nelson@xxxxxxxxx> wrote:
> >
> >
> > On 7/3/23 04:53, Matthew Booth wrote:
> > > On Thu, 29 Jun 2023 at 14:11, Mark Nelson <mark.nelson@xxxxxxxxx> wrote:
> > >>>>> This container runs:
> > >>>>>      fio --rw=write --ioengine=sync --fdatasync=1
> > >>>>> --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf
> > >>>>> --output-format=json --runtime=60 --time_based=1
> > >>>>>
> > >>>>> And extracts sync.lat_ns.percentile["99.000000"]
> > >>>> Matthew, do you have the rest of the fio output captured?  It would be interesting to see if it's just the 99th percentile that is bad or the PWL cache is worse in general.
> > >>> Sure.
> > >>>
> > >>> With PWL cache: https://paste.openstack.org/show/820504/
> > >>> Without PWL cache: https://paste.openstack.org/show/b35e71zAwtYR2hjmSRtR/
> > >>> With PWL cache, 'rbd_cache'=false:
> > >>> https://paste.openstack.org/show/byp8ZITPzb3r9bb06cPf/
> > >>
> > >> Also, how's the CPU usage client side?  I would be very curious to see
> > >> if unwindpmp shows anything useful (especially lock contention):
> > >>
> > >>
> > >> https://github.com/markhpc/uwpmp
> > >>
> > >>
> > >> Just attach it to the client-side process and start out with something
> > >> like 100 samples (more are better but take longer).  You can run it like:
> > >>
> > >>
> > >> ./unwindpmp -n 100 -p <pid>
> > > I've included the output in this gist:
> > > https://gist.github.com/mdbooth/2d68b7e081a37e27b78fe396d771427d
> > >
> > > That gist contains 4 runs: 2 with PWL enabled and 2 without, and also
> > > a markdown file explaining the collection method.
> > >
> > > Matt
> >
> >
> > Thanks Matt!  I looked through the output.  Looks like the symbols might
> > have gotten mangled.  I'm not an expert on the RBD client, but I don't
> > think we would really be calling into
> > rbd_group_snap_rollback_with_progress from
> > librbd::cache::pwl::ssd::WriteLogEntry::writeback_bl.  Was it possible
> > you used the libdw backend for unwindpmp?  libdw sometimes gives
> > strange/mangled callgraphs, but I haven't seen it before with
> > libunwind.  Hopefully Congmin Yin or Ilya can confirm if it's garbage.
>
> >
> > So with that said, assuming we can trust these callgraphs at all, it
> > looks like it might be worth looking at the latency of the
> > AbstractWriteLog, librbd::cache::pwl::ssd::WriteLogEntry::writeback_bl,
> > and possibly usage of librados::v14_2_0::IoCtx::object_list.  On the
>
> Hi Mark,
>
> Both rbd_group_snap_rollback_with_progress and
> librados::v14_2_0::IoCtx::object_list entries don't make sense to me,
> so I'd say it's garbage.

Unfortunately I'm not at all familiar with this tool. Do you know how
it obtains its symbols? I didn't install any debuginfo packages, so I
was a bit surprised to see any symbols at all.

Matt

-- 
Matthew Booth
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx