Re: ceph-users Digest, Vol 108, Issue 88

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I want to unsubscribe the general mailing list for Ceph users.

Sincerely Yours,
------------------------------------------
Ivan
E_: chenhui0228@xxxxxxxxx
A_: Wuhan, Hubei, China
------------------------------------------


<ceph-users-request@xxxxxxx> 于2023年6月28日周三 01:54写道:

> Send ceph-users mailing list submissions to
>         ceph-users@xxxxxxx
>
> To subscribe or unsubscribe via email, send a message with subject or
> body 'help' to
>         ceph-users-request@xxxxxxx
>
> You can reach the person managing the list at
>         ceph-users-owner@xxxxxxx
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of ceph-users digest..."
>
> Today's Topics:
>
>    1. Re: ceph orch host label rm : does not update label removal
>       (Adiga, Anantha)
>    2. Re: RBD with PWL cache shows poor performance compared to cache
> device
>       (Josh Baergen)
>    3. Re: RBD with PWL cache shows poor performance compared to cache
> device
>       (Matthew Booth)
>
>
> ----------------------------------------------------------------------
>
> Date: Tue, 27 Jun 2023 16:04:05 +0000
> From: "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>
> Subject:  Re: ceph orch host label rm : does not update
>         label removal
> To: "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>, "ceph-users@xxxxxxx"
>         <ceph-users@xxxxxxx>
> Message-ID: <CY5PR11MB62118438F851BE59AC91F85BF627A@xxxxxxxxxxxxxxxxxx
>         rd11.prod.outlook.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Hello,
>
> This issue is resolved.
>
> The syntax of providing the labels was not correct.
>
> -----Original Message-----
> From: Adiga, Anantha <anantha.adiga@xxxxxxxxx>
> Sent: Thursday, June 22, 2023 1:08 PM
> To: ceph-users@xxxxxxx
> Subject:  ceph orch host label rm : does not update label
> removal
>
> Hi ,
>
> Not sure if the lables are really removed or the update is not working?
> This was taken as a single label: mgrs,ceph osd,rgws.ceph
>
>
> root@fl31ca104ja0201:/# ceph orch host ls
> HOST             ADDR           LABELS
>         STATUS
> fl31ca104ja0201  XX.XX.XXX.139  ceph clients mdss mgrs monitoring mons
> osds rgws
> fl31ca104ja0202  XX.XX.XXX.140  ceph clients mdss mgrs mons osds rgws
> fl31ca104ja0203  XX.XX.XXX.141  ceph clients mdss mgrs mons osds rgws
> fl31ca104ja0302  XX.XX.XXX.5    _admin mgrs,ceph osd,rgws.ceph
> 4 hosts in cluster
>
> root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 mgrs,ceph
> osd,rgws.ceph
> Removed label mgrs,ceph osd,rgws.ceph from host fl31ca104ja0302
>
> root@fl31ca104ja0201:/# ceph orch host ls
> HOST             ADDR           LABELS
>         STATUS
> fl31ca104ja0201  XX.XX.XXX.139  ceph clients mdss mgrs monitoring mons
> osds rgws
> fl31ca104ja0202  XX.XX.XXX.140  ceph clients mdss mgrs mons osds rgws
> fl31ca104ja0203  XX.XX.XXX.141  ceph clients mdss mgrs mons osds rgws
> fl31ca104ja0302  XX.XX.XXX.5    _admin
> 4 hosts in cluster
>
> Thank you,
> Anantha
>
>
>
>
> root@fl31ca104ja0201:/# ceph orch host ls
> HOST             ADDR           LABELS
>         STATUS
> fl31ca104ja0201  XX.XX.XXX.139  ceph clients mdss mgrs monitoring mons
> osds rgws
> fl31ca104ja0202  XX.XX.XXX.140  ceph clients mdss mgrs mons osds rgws
> fl31ca104ja0203  XX.XX.XXX.141  ceph clients mdss mgrs mons osds rgws
> fl31ca104ja0302  XX.XX.XXX.5    _admin mgrs,ceph osd,rgws.ceph
> 4 hosts in cluster
> root@fl31ca104ja0201:/#
> root@fl31ca104ja0201:/#
> root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 rgws.ceph
> Removed label rgws.ceph from host fl31ca104ja0302 root@fl31ca104ja0201:/#
> ceph orch host ls
> HOST             ADDR           LABELS
>         STATUS
> fl31ca104ja0201  XX.XX.XXX.139  ceph clients mdss mgrs monitoring mons
> osds rgws
> fl31ca104ja0202  XX.XX.XXX.140  ceph clients mdss mgrs mons osds rgws
> fl31ca104ja0203  XX.XX.XXX.141  ceph clients mdss mgrs mons osds rgws
> fl31ca104ja0302  XX.XX.XXX.5    _admin mgrs,ceph osd,rgws.ceph
> 4 hosts in cluster
> root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 rgws.ceph
> --force Removed label rgws.ceph from host fl31ca104ja0302
> root@fl31ca104ja0201:/# ceph orch host ls
> HOST             ADDR           LABELS
>         STATUS
> fl31ca104ja0201  XX.XX.XXX.139  ceph clients mdss mgrs monitoring mons
> osds rgws
> fl31ca104ja0202  XX.XX.XXX.140  ceph clients mdss mgrs mons osds rgws
> fl31ca104ja0203  XX.XX.XXX.141  ceph clients mdss mgrs mons osds rgws
> fl31ca104ja0302  XX.XX.XXX.5    _admin mgrs,ceph osd,rgws.ceph
> 4 hosts in cluster
>
> Regards,
> Anantha
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx
>
> ------------------------------
>
> Date: Tue, 27 Jun 2023 11:20:01 -0600
> From: Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
> Subject:  Re: RBD with PWL cache shows poor performance
>         compared to cache device
> To: Matthew Booth <mbooth@xxxxxxxxxx>
> Cc: ceph-users@xxxxxxx
> Message-ID:
>         <
> CA+5zLQ+R4Yq+4F4GwTZKqz4VYj5sN7gniSf3X5wnsk06F_UFtQ@xxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi Matthew,
>
> We've done a limited amount of work on characterizing the pwl and I think
> it suffers the classic problem of some writeback caches in that, once the
> cache is saturated, it's actually worse than just being in writethrough.
> IIRC the pwl does try to preserve write ordering (unlike the other
> writeback/writearound modes) which limits it in the concurrency it can
> issue to the backend, which means that even an iodepth=1 test can saturate
> the pwl, assuming the backend latency is higher than the pwl latency.
>
> I _think_ that if you were able to devise a burst test with bursts smaller
> than the pwl capacity and gaps in between large enough for the cache to
> flush, or if you were to ratelimit I/Os to the pwl, that you should see
> closer to the lower latencies that you would expect.
>
> Josh
>
> On Tue, Jun 27, 2023 at 9:04 AM Matthew Booth <mbooth@xxxxxxxxxx> wrote:
>
> > ** TL;DR
> >
> > In testing, the write latency performance of a PWL-cache backed RBD
> > disk was 2 orders of magnitude worse than the disk holding the PWL
> > cache.
> >
> > ** Summary
> >
> > I was hoping that PWL cache might be a good solution to the problem of
> > write latency requirements of etcd when running a kubernetes control
> > plane on ceph. Etcd is extremely write latency sensitive and becomes
> > unstable if write latency is too high. The etcd workload can be
> > characterised by very small (~4k) writes with a queue depth of 1.
> > Throughput, even on a busy system, is normally very low. As etcd is
> > distributed and can safely handle the loss of un-flushed data from a
> > single node, a local ssd PWL cache for etcd looked like an ideal
> > solution.
> >
> > My expectation was that adding a PWL cache on a local SSD to an
> > RBD-backed would improve write latency to something approaching the
> > write latency performance of the local SSD. However, in my testing
> > adding a PWL cache to an rbd-backed VM increased write latency by
> > approximately 4x over not using a PWL cache. This was over 100x more
> > than the write latency performance of the underlying SSD.
> >
> > My expectation was based on the documentation here:
> > https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/
> >
> > “The cache provides two different persistence modes. In
> > persistent-on-write mode, the writes are completed only when they are
> > persisted to the cache device and will be readable after a crash. In
> > persistent-on-flush mode, the writes are completed as soon as it no
> > longer needs the caller’s data buffer to complete the writes, but does
> > not guarantee that writes will be readable after a crash. The data is
> > persisted to the cache device when a flush request is received.”
> >
> > ** Method
> >
> > 2 systems, 1 running single-node Ceph Quincy (17.2.6), the other
> > running libvirt and mounting a VM’s disk with librbd (also 17.2.6)
> > from the first node.
> >
> > All performance testing is from the libvirt system. I tested write
> > latency performance:
> >
> > * Inside the VM without a PWL cache
> > * Of the PWL device directly from the host (direct to filesystem, no VM)
> > * Inside the VM with a PWL cache
> >
> > I am testing with fio. Specifically I am running a containerised test,
> > executed with:
> >   podman run --volume .:/var/lib/etcd:Z
> quay.io/openshift-scale/etcd-perf
> >
> > This container runs:
> >   fio --rw=write --ioengine=sync --fdatasync=1
> > --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf
> > --output-format=json --runtime=60 --time_based=1
> >
> > And extracts sync.lat_ns.percentile["99.000000"]
> >
> > ** Results
> >
> > All results were stable across multiple runs within a small margin of
> > error.
> >
> > * rbd no cache: 1417216 ns
> > * pwl cache device: 44288 ns
> > * rbd with pwl cache: 5210112 ns
> >
> > Note that by adding a PWL cache we increase write latency by
> > approximately 4x, which is more than 100x than the underlying device.
> >
> > ** Hardware
> >
> > 2 x Dell R640s, each with Xeon Silver 4216 CPU @ 2.10GHz and 192G RAM
> > Storage under test: 2 x SAMSUNG MZ7KH480HAHQ0D3 SSDs attached to PERC
> > H730P Mini (Embedded)
> >
> > OS installed on rotational disks
> >
> > N.B. Linux incorrectly detects these disks as rotational, which I
> > assume relates to weird behaviour by the PERC controller. I remembered
> > to manually correct this on the ‘client’ machine for the PWL cache,
> > but at OSD configuration time ceph would have detected them as
> > rotational. They are not rotational.
> >
> > ** Ceph Configuration
> >
> > CentOS Stream 9
> >
> >   # ceph version
> >   ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
> > (stable)
> >
> > Single node installation with cephadm. 2 OSDs, one on each SSD.
> > 1 pool with size 2
> >
> > ** Client Configuration
> >
> > Fedora 38
> > Librbd1-17.2.6-3.fc38.x86_64
> >
> > PWL cache is XFS filesystem with 4k block size, matching the
> > underlying device. The filesystem uses the whole block device. There
> > is no other load on the system.
> >
> > ** RBD Configuration
> >
> > # rbd config image list libvirt-pool/pwl-test | grep cache
> > rbd_cache                                    true
> >  config
> > rbd_cache_block_writes_upfront               false
> > config
> > rbd_cache_max_dirty                          25165824
> >  config
> > rbd_cache_max_dirty_age                      1.000000
> >  config
> > rbd_cache_max_dirty_object                   0
> > config
> > rbd_cache_policy                             writeback
> > pool
> > rbd_cache_size                               33554432
> >  config
> > rbd_cache_target_dirty                       16777216
> >  config
> > rbd_cache_writethrough_until_flush           true
> >  pool
> > rbd_parent_cache_enabled                     false
> > config
> > rbd_persistent_cache_mode                    ssd
> > pool
> > rbd_persistent_cache_path                    /var/lib/libvirt/images/pwl
> > pool
> > rbd_persistent_cache_size                    1073741824
> >  config
> > rbd_plugins                                  pwl_cache
> > pool
> >
> > # rbd status libvirt-pool/pwl-test
> > Watchers:
> >         watcher=10.1.240.27:0/1406459716 client.14475
> > cookie=140282423200720
> > Persistent cache state:
> >         host: dell-r640-050
> >         path:
> > /var/lib/libvirt/images/pwl/rbd-pwl.libvirt-pool.37e947fd216b.pool
> >         size: 1 GiB
> >         mode: ssd
> >         stats_timestamp: Mon Jun 26 11:29:21 2023
> >         present: true   empty: false    clean: true
> >         allocated: 180 MiB
> >         cached: 135 MiB
> >         dirty: 0 B
> >         free: 844 MiB
> >         hits_full: 1 / 0%
> >         hits_partial: 3 / 0%
> >         misses: 21952
> >         hit_bytes: 6 KiB / 0%
> >         miss_bytes: 349 MiB
> > --
> > Matthew Booth
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
>
> ------------------------------
>
> Date: Tue, 27 Jun 2023 18:50:07 +0100
> From: Matthew Booth <mbooth@xxxxxxxxxx>
> Subject:  Re: RBD with PWL cache shows poor performance
>         compared to cache device
> To: Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
> Cc: ceph-users@xxxxxxx
> Message-ID:
>         <
> CAEkQehcGq-88heq3UN8tsWsTyOfTcm1122Ffxy7PEhXF7Hj1mA@xxxxxxxxxxxxxx>
> Content-Type: text/plain; charset="UTF-8"
>
> On Tue, 27 Jun 2023 at 18:20, Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
> wrote:
> >
> > Hi Matthew,
> >
> > We've done a limited amount of work on characterizing the pwl and I
> think it suffers the classic problem of some writeback caches in that, once
> the cache is saturated, it's actually worse than just being in
> writethrough. IIRC the pwl does try to preserve write ordering (unlike the
> other writeback/writearound modes) which limits it in the concurrency it
> can issue to the backend, which means that even an iodepth=1 test can
> saturate the pwl, assuming the backend latency is higher than the pwl
> latency.
>
> What do you mean by saturated here? FWIW I was using the default cache
> size of 1G and each test run only wrote ~100MB of data, so I don't
> think I ever filled the cache, even with multiple runs.
>
> > I _think_ that if you were able to devise a burst test with bursts
> smaller than the pwl capacity and gaps in between large enough for the
> cache to flush, or if you were to ratelimit I/Os to the pwl, that you
> should see closer to the lower latencies that you would expect.
>
> My goal is to characterise the requirements of etcd. Unfortunately I
> don't think changing the test would do that. Incidentally, note that
> the total bandwidth of an extremely busy etcd is usually very low.
> >From memory, the etcd write rate for a system we were debugging whose
> etcd was occasionally falling over due to load was only about 5MiB/s.
> It's all about write latency of really small writes, not bandwidth.
>
> Matt
>
> >
> > Josh
> >
> > On Tue, Jun 27, 2023 at 9:04 AM Matthew Booth <mbooth@xxxxxxxxxx> wrote:
> >>
> >> ** TL;DR
> >>
> >> In testing, the write latency performance of a PWL-cache backed RBD
> >> disk was 2 orders of magnitude worse than the disk holding the PWL
> >> cache.
> >>
> >> ** Summary
> >>
> >> I was hoping that PWL cache might be a good solution to the problem of
> >> write latency requirements of etcd when running a kubernetes control
> >> plane on ceph. Etcd is extremely write latency sensitive and becomes
> >> unstable if write latency is too high. The etcd workload can be
> >> characterised by very small (~4k) writes with a queue depth of 1.
> >> Throughput, even on a busy system, is normally very low. As etcd is
> >> distributed and can safely handle the loss of un-flushed data from a
> >> single node, a local ssd PWL cache for etcd looked like an ideal
> >> solution.
> >>
> >> My expectation was that adding a PWL cache on a local SSD to an
> >> RBD-backed would improve write latency to something approaching the
> >> write latency performance of the local SSD. However, in my testing
> >> adding a PWL cache to an rbd-backed VM increased write latency by
> >> approximately 4x over not using a PWL cache. This was over 100x more
> >> than the write latency performance of the underlying SSD.
> >>
> >> My expectation was based on the documentation here:
> >> https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/
> >>
> >> “The cache provides two different persistence modes. In
> >> persistent-on-write mode, the writes are completed only when they are
> >> persisted to the cache device and will be readable after a crash. In
> >> persistent-on-flush mode, the writes are completed as soon as it no
> >> longer needs the caller’s data buffer to complete the writes, but does
> >> not guarantee that writes will be readable after a crash. The data is
> >> persisted to the cache device when a flush request is received.”
> >>
> >> ** Method
> >>
> >> 2 systems, 1 running single-node Ceph Quincy (17.2.6), the other
> >> running libvirt and mounting a VM’s disk with librbd (also 17.2.6)
> >> from the first node.
> >>
> >> All performance testing is from the libvirt system. I tested write
> >> latency performance:
> >>
> >> * Inside the VM without a PWL cache
> >> * Of the PWL device directly from the host (direct to filesystem, no VM)
> >> * Inside the VM with a PWL cache
> >>
> >> I am testing with fio. Specifically I am running a containerised test,
> >> executed with:
> >>   podman run --volume .:/var/lib/etcd:Z
> quay.io/openshift-scale/etcd-perf
> >>
> >> This container runs:
> >>   fio --rw=write --ioengine=sync --fdatasync=1
> >> --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf
> >> --output-format=json --runtime=60 --time_based=1
> >>
> >> And extracts sync.lat_ns.percentile["99.000000"]
> >>
> >> ** Results
> >>
> >> All results were stable across multiple runs within a small margin of
> error.
> >>
> >> * rbd no cache: 1417216 ns
> >> * pwl cache device: 44288 ns
> >> * rbd with pwl cache: 5210112 ns
> >>
> >> Note that by adding a PWL cache we increase write latency by
> >> approximately 4x, which is more than 100x than the underlying device.
> >>
> >> ** Hardware
> >>
> >> 2 x Dell R640s, each with Xeon Silver 4216 CPU @ 2.10GHz and 192G RAM
> >> Storage under test: 2 x SAMSUNG MZ7KH480HAHQ0D3 SSDs attached to PERC
> >> H730P Mini (Embedded)
> >>
> >> OS installed on rotational disks
> >>
> >> N.B. Linux incorrectly detects these disks as rotational, which I
> >> assume relates to weird behaviour by the PERC controller. I remembered
> >> to manually correct this on the ‘client’ machine for the PWL cache,
> >> but at OSD configuration time ceph would have detected them as
> >> rotational. They are not rotational.
> >>
> >> ** Ceph Configuration
> >>
> >> CentOS Stream 9
> >>
> >>   # ceph version
> >>   ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
> (stable)
> >>
> >> Single node installation with cephadm. 2 OSDs, one on each SSD.
> >> 1 pool with size 2
> >>
> >> ** Client Configuration
> >>
> >> Fedora 38
> >> Librbd1-17.2.6-3.fc38.x86_64
> >>
> >> PWL cache is XFS filesystem with 4k block size, matching the
> >> underlying device. The filesystem uses the whole block device. There
> >> is no other load on the system.
> >>
> >> ** RBD Configuration
> >>
> >> # rbd config image list libvirt-pool/pwl-test | grep cache
> >> rbd_cache                                    true
>    config
> >> rbd_cache_block_writes_upfront               false
>   config
> >> rbd_cache_max_dirty                          25165824
>    config
> >> rbd_cache_max_dirty_age                      1.000000
>    config
> >> rbd_cache_max_dirty_object                   0
>   config
> >> rbd_cache_policy                             writeback
>   pool
> >> rbd_cache_size                               33554432
>    config
> >> rbd_cache_target_dirty                       16777216
>    config
> >> rbd_cache_writethrough_until_flush           true
>    pool
> >> rbd_parent_cache_enabled                     false
>   config
> >> rbd_persistent_cache_mode                    ssd
>   pool
> >> rbd_persistent_cache_path
> /var/lib/libvirt/images/pwl  pool
> >> rbd_persistent_cache_size                    1073741824
>    config
> >> rbd_plugins                                  pwl_cache
>   pool
> >>
> >> # rbd status libvirt-pool/pwl-test
> >> Watchers:
> >>         watcher=10.1.240.27:0/1406459716 client.14475
> cookie=140282423200720
> >> Persistent cache state:
> >>         host: dell-r640-050
> >>         path:
> /var/lib/libvirt/images/pwl/rbd-pwl.libvirt-pool.37e947fd216b.pool
> >>         size: 1 GiB
> >>         mode: ssd
> >>         stats_timestamp: Mon Jun 26 11:29:21 2023
> >>         present: true   empty: false    clean: true
> >>         allocated: 180 MiB
> >>         cached: 135 MiB
> >>         dirty: 0 B
> >>         free: 844 MiB
> >>         hits_full: 1 / 0%
> >>         hits_partial: 3 / 0%
> >>         misses: 21952
> >>         hit_bytes: 6 KiB / 0%
> >>         miss_bytes: 349 MiB
> >> --
> >> Matthew Booth
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> --
> Matthew Booth
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>
>
> ------------------------------
>
> End of ceph-users Digest, Vol 108, Issue 88
> *******************************************
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux