I want to unsubscribe the general mailing list for Ceph users. Sincerely Yours, ------------------------------------------ Ivan E_: chenhui0228@xxxxxxxxx A_: Wuhan, Hubei, China ------------------------------------------ <ceph-users-request@xxxxxxx> 于2023年6月28日周三 01:54写道: > Send ceph-users mailing list submissions to > ceph-users@xxxxxxx > > To subscribe or unsubscribe via email, send a message with subject or > body 'help' to > ceph-users-request@xxxxxxx > > You can reach the person managing the list at > ceph-users-owner@xxxxxxx > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of ceph-users digest..." > > Today's Topics: > > 1. Re: ceph orch host label rm : does not update label removal > (Adiga, Anantha) > 2. Re: RBD with PWL cache shows poor performance compared to cache > device > (Josh Baergen) > 3. Re: RBD with PWL cache shows poor performance compared to cache > device > (Matthew Booth) > > > ---------------------------------------------------------------------- > > Date: Tue, 27 Jun 2023 16:04:05 +0000 > From: "Adiga, Anantha" <anantha.adiga@xxxxxxxxx> > Subject: Re: ceph orch host label rm : does not update > label removal > To: "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>, "ceph-users@xxxxxxx" > <ceph-users@xxxxxxx> > Message-ID: <CY5PR11MB62118438F851BE59AC91F85BF627A@xxxxxxxxxxxxxxxxxx > rd11.prod.outlook.com> > Content-Type: text/plain; charset="us-ascii" > > Hello, > > This issue is resolved. > > The syntax of providing the labels was not correct. > > -----Original Message----- > From: Adiga, Anantha <anantha.adiga@xxxxxxxxx> > Sent: Thursday, June 22, 2023 1:08 PM > To: ceph-users@xxxxxxx > Subject: ceph orch host label rm : does not update label > removal > > Hi , > > Not sure if the lables are really removed or the update is not working? > This was taken as a single label: mgrs,ceph osd,rgws.ceph > > > root@fl31ca104ja0201:/# ceph orch host ls > HOST ADDR LABELS > STATUS > fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons > osds rgws > fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0302 XX.XX.XXX.5 _admin mgrs,ceph osd,rgws.ceph > 4 hosts in cluster > > root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 mgrs,ceph > osd,rgws.ceph > Removed label mgrs,ceph osd,rgws.ceph from host fl31ca104ja0302 > > root@fl31ca104ja0201:/# ceph orch host ls > HOST ADDR LABELS > STATUS > fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons > osds rgws > fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0302 XX.XX.XXX.5 _admin > 4 hosts in cluster > > Thank you, > Anantha > > > > > root@fl31ca104ja0201:/# ceph orch host ls > HOST ADDR LABELS > STATUS > fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons > osds rgws > fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0302 XX.XX.XXX.5 _admin mgrs,ceph osd,rgws.ceph > 4 hosts in cluster > root@fl31ca104ja0201:/# > root@fl31ca104ja0201:/# > root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 rgws.ceph > Removed label rgws.ceph from host fl31ca104ja0302 root@fl31ca104ja0201:/# > ceph orch host ls > HOST ADDR LABELS > STATUS > fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons > osds rgws > fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0302 XX.XX.XXX.5 _admin mgrs,ceph osd,rgws.ceph > 4 hosts in cluster > root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 rgws.ceph > --force Removed label rgws.ceph from host fl31ca104ja0302 > root@fl31ca104ja0201:/# ceph orch host ls > HOST ADDR LABELS > STATUS > fl31ca104ja0201 XX.XX.XXX.139 ceph clients mdss mgrs monitoring mons > osds rgws > fl31ca104ja0202 XX.XX.XXX.140 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0203 XX.XX.XXX.141 ceph clients mdss mgrs mons osds rgws > fl31ca104ja0302 XX.XX.XXX.5 _admin mgrs,ceph osd,rgws.ceph > 4 hosts in cluster > > Regards, > Anantha > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx > > ------------------------------ > > Date: Tue, 27 Jun 2023 11:20:01 -0600 > From: Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> > Subject: Re: RBD with PWL cache shows poor performance > compared to cache device > To: Matthew Booth <mbooth@xxxxxxxxxx> > Cc: ceph-users@xxxxxxx > Message-ID: > < > CA+5zLQ+R4Yq+4F4GwTZKqz4VYj5sN7gniSf3X5wnsk06F_UFtQ@xxxxxxxxxxxxxx> > Content-Type: text/plain; charset="UTF-8" > > Hi Matthew, > > We've done a limited amount of work on characterizing the pwl and I think > it suffers the classic problem of some writeback caches in that, once the > cache is saturated, it's actually worse than just being in writethrough. > IIRC the pwl does try to preserve write ordering (unlike the other > writeback/writearound modes) which limits it in the concurrency it can > issue to the backend, which means that even an iodepth=1 test can saturate > the pwl, assuming the backend latency is higher than the pwl latency. > > I _think_ that if you were able to devise a burst test with bursts smaller > than the pwl capacity and gaps in between large enough for the cache to > flush, or if you were to ratelimit I/Os to the pwl, that you should see > closer to the lower latencies that you would expect. > > Josh > > On Tue, Jun 27, 2023 at 9:04 AM Matthew Booth <mbooth@xxxxxxxxxx> wrote: > > > ** TL;DR > > > > In testing, the write latency performance of a PWL-cache backed RBD > > disk was 2 orders of magnitude worse than the disk holding the PWL > > cache. > > > > ** Summary > > > > I was hoping that PWL cache might be a good solution to the problem of > > write latency requirements of etcd when running a kubernetes control > > plane on ceph. Etcd is extremely write latency sensitive and becomes > > unstable if write latency is too high. The etcd workload can be > > characterised by very small (~4k) writes with a queue depth of 1. > > Throughput, even on a busy system, is normally very low. As etcd is > > distributed and can safely handle the loss of un-flushed data from a > > single node, a local ssd PWL cache for etcd looked like an ideal > > solution. > > > > My expectation was that adding a PWL cache on a local SSD to an > > RBD-backed would improve write latency to something approaching the > > write latency performance of the local SSD. However, in my testing > > adding a PWL cache to an rbd-backed VM increased write latency by > > approximately 4x over not using a PWL cache. This was over 100x more > > than the write latency performance of the underlying SSD. > > > > My expectation was based on the documentation here: > > https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/ > > > > “The cache provides two different persistence modes. In > > persistent-on-write mode, the writes are completed only when they are > > persisted to the cache device and will be readable after a crash. In > > persistent-on-flush mode, the writes are completed as soon as it no > > longer needs the caller’s data buffer to complete the writes, but does > > not guarantee that writes will be readable after a crash. The data is > > persisted to the cache device when a flush request is received.” > > > > ** Method > > > > 2 systems, 1 running single-node Ceph Quincy (17.2.6), the other > > running libvirt and mounting a VM’s disk with librbd (also 17.2.6) > > from the first node. > > > > All performance testing is from the libvirt system. I tested write > > latency performance: > > > > * Inside the VM without a PWL cache > > * Of the PWL device directly from the host (direct to filesystem, no VM) > > * Inside the VM with a PWL cache > > > > I am testing with fio. Specifically I am running a containerised test, > > executed with: > > podman run --volume .:/var/lib/etcd:Z > quay.io/openshift-scale/etcd-perf > > > > This container runs: > > fio --rw=write --ioengine=sync --fdatasync=1 > > --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf > > --output-format=json --runtime=60 --time_based=1 > > > > And extracts sync.lat_ns.percentile["99.000000"] > > > > ** Results > > > > All results were stable across multiple runs within a small margin of > > error. > > > > * rbd no cache: 1417216 ns > > * pwl cache device: 44288 ns > > * rbd with pwl cache: 5210112 ns > > > > Note that by adding a PWL cache we increase write latency by > > approximately 4x, which is more than 100x than the underlying device. > > > > ** Hardware > > > > 2 x Dell R640s, each with Xeon Silver 4216 CPU @ 2.10GHz and 192G RAM > > Storage under test: 2 x SAMSUNG MZ7KH480HAHQ0D3 SSDs attached to PERC > > H730P Mini (Embedded) > > > > OS installed on rotational disks > > > > N.B. Linux incorrectly detects these disks as rotational, which I > > assume relates to weird behaviour by the PERC controller. I remembered > > to manually correct this on the ‘client’ machine for the PWL cache, > > but at OSD configuration time ceph would have detected them as > > rotational. They are not rotational. > > > > ** Ceph Configuration > > > > CentOS Stream 9 > > > > # ceph version > > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > > (stable) > > > > Single node installation with cephadm. 2 OSDs, one on each SSD. > > 1 pool with size 2 > > > > ** Client Configuration > > > > Fedora 38 > > Librbd1-17.2.6-3.fc38.x86_64 > > > > PWL cache is XFS filesystem with 4k block size, matching the > > underlying device. The filesystem uses the whole block device. There > > is no other load on the system. > > > > ** RBD Configuration > > > > # rbd config image list libvirt-pool/pwl-test | grep cache > > rbd_cache true > > config > > rbd_cache_block_writes_upfront false > > config > > rbd_cache_max_dirty 25165824 > > config > > rbd_cache_max_dirty_age 1.000000 > > config > > rbd_cache_max_dirty_object 0 > > config > > rbd_cache_policy writeback > > pool > > rbd_cache_size 33554432 > > config > > rbd_cache_target_dirty 16777216 > > config > > rbd_cache_writethrough_until_flush true > > pool > > rbd_parent_cache_enabled false > > config > > rbd_persistent_cache_mode ssd > > pool > > rbd_persistent_cache_path /var/lib/libvirt/images/pwl > > pool > > rbd_persistent_cache_size 1073741824 > > config > > rbd_plugins pwl_cache > > pool > > > > # rbd status libvirt-pool/pwl-test > > Watchers: > > watcher=10.1.240.27:0/1406459716 client.14475 > > cookie=140282423200720 > > Persistent cache state: > > host: dell-r640-050 > > path: > > /var/lib/libvirt/images/pwl/rbd-pwl.libvirt-pool.37e947fd216b.pool > > size: 1 GiB > > mode: ssd > > stats_timestamp: Mon Jun 26 11:29:21 2023 > > present: true empty: false clean: true > > allocated: 180 MiB > > cached: 135 MiB > > dirty: 0 B > > free: 844 MiB > > hits_full: 1 / 0% > > hits_partial: 3 / 0% > > misses: 21952 > > hit_bytes: 6 KiB / 0% > > miss_bytes: 349 MiB > > -- > > Matthew Booth > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > ------------------------------ > > Date: Tue, 27 Jun 2023 18:50:07 +0100 > From: Matthew Booth <mbooth@xxxxxxxxxx> > Subject: Re: RBD with PWL cache shows poor performance > compared to cache device > To: Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> > Cc: ceph-users@xxxxxxx > Message-ID: > < > CAEkQehcGq-88heq3UN8tsWsTyOfTcm1122Ffxy7PEhXF7Hj1mA@xxxxxxxxxxxxxx> > Content-Type: text/plain; charset="UTF-8" > > On Tue, 27 Jun 2023 at 18:20, Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx> > wrote: > > > > Hi Matthew, > > > > We've done a limited amount of work on characterizing the pwl and I > think it suffers the classic problem of some writeback caches in that, once > the cache is saturated, it's actually worse than just being in > writethrough. IIRC the pwl does try to preserve write ordering (unlike the > other writeback/writearound modes) which limits it in the concurrency it > can issue to the backend, which means that even an iodepth=1 test can > saturate the pwl, assuming the backend latency is higher than the pwl > latency. > > What do you mean by saturated here? FWIW I was using the default cache > size of 1G and each test run only wrote ~100MB of data, so I don't > think I ever filled the cache, even with multiple runs. > > > I _think_ that if you were able to devise a burst test with bursts > smaller than the pwl capacity and gaps in between large enough for the > cache to flush, or if you were to ratelimit I/Os to the pwl, that you > should see closer to the lower latencies that you would expect. > > My goal is to characterise the requirements of etcd. Unfortunately I > don't think changing the test would do that. Incidentally, note that > the total bandwidth of an extremely busy etcd is usually very low. > >From memory, the etcd write rate for a system we were debugging whose > etcd was occasionally falling over due to load was only about 5MiB/s. > It's all about write latency of really small writes, not bandwidth. > > Matt > > > > > Josh > > > > On Tue, Jun 27, 2023 at 9:04 AM Matthew Booth <mbooth@xxxxxxxxxx> wrote: > >> > >> ** TL;DR > >> > >> In testing, the write latency performance of a PWL-cache backed RBD > >> disk was 2 orders of magnitude worse than the disk holding the PWL > >> cache. > >> > >> ** Summary > >> > >> I was hoping that PWL cache might be a good solution to the problem of > >> write latency requirements of etcd when running a kubernetes control > >> plane on ceph. Etcd is extremely write latency sensitive and becomes > >> unstable if write latency is too high. The etcd workload can be > >> characterised by very small (~4k) writes with a queue depth of 1. > >> Throughput, even on a busy system, is normally very low. As etcd is > >> distributed and can safely handle the loss of un-flushed data from a > >> single node, a local ssd PWL cache for etcd looked like an ideal > >> solution. > >> > >> My expectation was that adding a PWL cache on a local SSD to an > >> RBD-backed would improve write latency to something approaching the > >> write latency performance of the local SSD. However, in my testing > >> adding a PWL cache to an rbd-backed VM increased write latency by > >> approximately 4x over not using a PWL cache. This was over 100x more > >> than the write latency performance of the underlying SSD. > >> > >> My expectation was based on the documentation here: > >> https://docs.ceph.com/en/quincy/rbd/rbd-persistent-write-log-cache/ > >> > >> “The cache provides two different persistence modes. In > >> persistent-on-write mode, the writes are completed only when they are > >> persisted to the cache device and will be readable after a crash. In > >> persistent-on-flush mode, the writes are completed as soon as it no > >> longer needs the caller’s data buffer to complete the writes, but does > >> not guarantee that writes will be readable after a crash. The data is > >> persisted to the cache device when a flush request is received.” > >> > >> ** Method > >> > >> 2 systems, 1 running single-node Ceph Quincy (17.2.6), the other > >> running libvirt and mounting a VM’s disk with librbd (also 17.2.6) > >> from the first node. > >> > >> All performance testing is from the libvirt system. I tested write > >> latency performance: > >> > >> * Inside the VM without a PWL cache > >> * Of the PWL device directly from the host (direct to filesystem, no VM) > >> * Inside the VM with a PWL cache > >> > >> I am testing with fio. Specifically I am running a containerised test, > >> executed with: > >> podman run --volume .:/var/lib/etcd:Z > quay.io/openshift-scale/etcd-perf > >> > >> This container runs: > >> fio --rw=write --ioengine=sync --fdatasync=1 > >> --directory=/var/lib/etcd --size=100m --bs=8000 --name=etcd_perf > >> --output-format=json --runtime=60 --time_based=1 > >> > >> And extracts sync.lat_ns.percentile["99.000000"] > >> > >> ** Results > >> > >> All results were stable across multiple runs within a small margin of > error. > >> > >> * rbd no cache: 1417216 ns > >> * pwl cache device: 44288 ns > >> * rbd with pwl cache: 5210112 ns > >> > >> Note that by adding a PWL cache we increase write latency by > >> approximately 4x, which is more than 100x than the underlying device. > >> > >> ** Hardware > >> > >> 2 x Dell R640s, each with Xeon Silver 4216 CPU @ 2.10GHz and 192G RAM > >> Storage under test: 2 x SAMSUNG MZ7KH480HAHQ0D3 SSDs attached to PERC > >> H730P Mini (Embedded) > >> > >> OS installed on rotational disks > >> > >> N.B. Linux incorrectly detects these disks as rotational, which I > >> assume relates to weird behaviour by the PERC controller. I remembered > >> to manually correct this on the ‘client’ machine for the PWL cache, > >> but at OSD configuration time ceph would have detected them as > >> rotational. They are not rotational. > >> > >> ** Ceph Configuration > >> > >> CentOS Stream 9 > >> > >> # ceph version > >> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > (stable) > >> > >> Single node installation with cephadm. 2 OSDs, one on each SSD. > >> 1 pool with size 2 > >> > >> ** Client Configuration > >> > >> Fedora 38 > >> Librbd1-17.2.6-3.fc38.x86_64 > >> > >> PWL cache is XFS filesystem with 4k block size, matching the > >> underlying device. The filesystem uses the whole block device. There > >> is no other load on the system. > >> > >> ** RBD Configuration > >> > >> # rbd config image list libvirt-pool/pwl-test | grep cache > >> rbd_cache true > config > >> rbd_cache_block_writes_upfront false > config > >> rbd_cache_max_dirty 25165824 > config > >> rbd_cache_max_dirty_age 1.000000 > config > >> rbd_cache_max_dirty_object 0 > config > >> rbd_cache_policy writeback > pool > >> rbd_cache_size 33554432 > config > >> rbd_cache_target_dirty 16777216 > config > >> rbd_cache_writethrough_until_flush true > pool > >> rbd_parent_cache_enabled false > config > >> rbd_persistent_cache_mode ssd > pool > >> rbd_persistent_cache_path > /var/lib/libvirt/images/pwl pool > >> rbd_persistent_cache_size 1073741824 > config > >> rbd_plugins pwl_cache > pool > >> > >> # rbd status libvirt-pool/pwl-test > >> Watchers: > >> watcher=10.1.240.27:0/1406459716 client.14475 > cookie=140282423200720 > >> Persistent cache state: > >> host: dell-r640-050 > >> path: > /var/lib/libvirt/images/pwl/rbd-pwl.libvirt-pool.37e947fd216b.pool > >> size: 1 GiB > >> mode: ssd > >> stats_timestamp: Mon Jun 26 11:29:21 2023 > >> present: true empty: false clean: true > >> allocated: 180 MiB > >> cached: 135 MiB > >> dirty: 0 B > >> free: 844 MiB > >> hits_full: 1 / 0% > >> hits_partial: 3 / 0% > >> misses: 21952 > >> hit_bytes: 6 KiB / 0% > >> miss_bytes: 349 MiB > >> -- > >> Matthew Booth > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > -- > Matthew Booth > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > > > ------------------------------ > > End of ceph-users Digest, Vol 108, Issue 88 > ******************************************* > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx