Hello, just to throw some hard numbers into the ring, I've (very much STRESS) tested readproxy vs. readforward with more or less expected results. New Jewel cluster, 3 cache-tier nodes (5 OSD SSDs each), 3 HDD nodes, IPoIB network. Notably 2x E5-2623 v3 @ 3.00GHz in the cache-tiers. 2 VMs (on different compute nodes, not that either network nor CPU were a bottleneck there), running fio. After creating a 12GB fio file on each and filling it, the cache was flushed/evicted. The it was filled again for one VM by running write fio again, while reading on the other VM. Resulting in hot pagecaches, slab, etc on all 6 OSD nodes. Fio cmd line: --- fio --size=12G --ioengine=libaio --invalidate=1 --direct=1 --numjobs=1 --rw=randwrite --name=fiojob --blocksize=4K --iodepth=32 --- With randread of course on the other VM. Solo performance (with readproxy or readforward, no diff): RandRead: 25k IOPS (these all go to the HDD nodes) RandWrite: 21k IOPS (these all go to the SSD cache-tier nodes) Note that during randwrites the cache tier nodes are using about 80-90% of their CPU, OSD processes eating more than 300% each (of 1600 total). Neither SSDs nor network are maxed out, the later far far from it. Concurrent performance with readproxy: RandRead: 6k IOPS (nearly no idle CPU left on the cache-tier nodes) RandWrite: 20k IOPS Concurrent performance with readforward: RandRead: 14k IOPS RandWrite: 20k IOPS So writes seem to be not particular impacted by the forwarding/proxying going on in parallel. And while readforward still suffers from what I assume is CPU contention on the cache-tiers, it is unsurprisingly more than twice as fast than readproxy. Too bad that it will still eat your babies, supposedly. And again no, the network is not the bottleneck. Christian On Fri, 9 Jun 2017 11:45:46 +0900 Christian Balzer wrote: > On Thu, 8 Jun 2017 07:06:04 -0400 Alfredo Deza wrote: > > > On Thu, Jun 8, 2017 at 3:38 AM, Christian Balzer <chibi@xxxxxxx> wrote: > > > On Thu, 8 Jun 2017 17:03:15 +1000 Brad Hubbard wrote: > > > > > >> On Thu, Jun 8, 2017 at 3:47 PM, Christian Balzer <chibi@xxxxxxx> wrote: > > >> > On Thu, 8 Jun 2017 15:29:05 +1000 Brad Hubbard wrote: > > >> > > > >> >> On Thu, Jun 8, 2017 at 3:10 PM, Christian Balzer <chibi@xxxxxxx> wrote: > > >> >> > On Thu, 8 Jun 2017 14:21:43 +1000 Brad Hubbard wrote: > > >> >> > > > >> >> >> On Thu, Jun 8, 2017 at 1:06 PM, Christian Balzer <chibi@xxxxxxx> wrote: > > >> >> >> > > > >> >> >> > Hello, > > >> >> >> > > > >> >> >> > New cluster, Jewel, setting up cache-tiering: > > >> >> >> > --- > > >> >> >> > Error EPERM: 'readforward' is not a well-supported cache mode and may corrupt your data. pass --yes-i-really-mean-it to force. > > >> >> >> > --- > > >> >> >> > > > >> >> >> > That's new and certainly wasn't there in Hammer, nor did it whine about > > >> >> >> > this when upgrading my test cluster to Jewel. > > >> >> >> > > > >> >> >> > And speaking of whining, I did that about this and readproxy, but not > > >> >> >> > their stability (readforward has been working nearly a year flawlessly in > > >> >> >> > the test cluster) but their lack of documentation. > > >> >> >> > > > >> >> >> > So while of course there is no warranty for anything with OSS, is there > > >> >> >> > any real reason for the above scaremongering or is that based solely on > > >> >> >> > lack of testing/experience? > > >> >> >> > > >> >> >> https://github.com/ceph/ceph/pull/8210 and > > >> >> >> https://github.com/ceph/ceph/pull/8210/commits/90fe8e3d0b1ded6d14a6a43ecbd6c8634f691fbe > > >> >> >> may offer some insight. > > >> >> >> > > >> >> > They do, alas of course immediately raise the following questions: > > >> >> > > > >> >> > 1. Where is that mode documented? > > >> >> > > >> >> It *was* documented by, > > >> >> https://github.com/ceph/ceph/pull/7023/commits/d821acada39937b9dacf87614c924114adea8a58 > > >> >> in https://github.com/ceph/ceph/pull/7023 but was removed by > > >> >> https://github.com/ceph/ceph/commit/6b6b38163b7742d97d21457cf38bdcc9bde5ae1a > > >> >> in https://github.com/ceph/ceph/pull/9070 > > >> >> > > >> > > > >> > I was talking about proxy, which isn't AFAICT, nor is there a BIG bold red > > >> > > >> That was hard to follow for me, in a thread titled "Cache mode > > >> readforward mode will eat your babies?". > > >> > > > Context, the initial github bits talk about proxy. > > > > > > Anyways, the documentation is in utter shambles and wrong and this really > > > really should have been mentioned more clearly in the release notes, but > > > then again none of the other cache changes were, never mind the wrong > > > osd_tier_promote_max* defaults. > > > > > > So for the record: > > > > > > The readproxy mode does what the old documentation states and proxies > > > objects through the cache-tier when being read w/o promoting them[*], while > > > writing objects will go into cache-tier as usual and with the > > > rate configured. > > > > > > [*] > > > Pro-Tip: It does however do the silent 0 byte object creation for reads, > > > so your cache-tier storage performance will be somewhat impacted, in > > > addition to the CPU usage there that readforward would have also avoided. > > > This is important when considering the value for "target_max_objects", as a > > > writeback mode cache will likely evict things based on space used and > > > reach a natural upper object limit. > > > For example an existing cache-tier in writeback mode here has a 2GB size > > > and 560K objects, 13.4TB and 3.6M objects on the backing storage. > > > With readproxy and a similar sized cluster I'll be setting > > > "target_max_objects" to something around 2M to avoid needless eviction and > > > then re-creation of null objects when things are read. > > > > Thank you for taking the time to explain this in the mailing list, > > could you help us in submitting a pull request with this > > documentation addition? > > > > I'll review that whole page again, it's riddled with stuff. > Like in the eviction settings talking about flushing all of a sudden, > which doesn't help when most people are confused by the those two things > initially anyway. > > Christian > > > I would be happy to review and merge. > > > > > > Christian > > > > > >> > statement in the release notes (or docs) for everybody to switch from > > >> > (read)forward to (read)proxy. > > >> > > > >> > And the two bits up there have _very_ conflicting statements about what > > >> > readproxy does, the older one would do what I want (at the cost of > > >> > shuffling all through the cache-tier network pipes), the newer one seems > > >> > to be actually describing the proxy functionality (no new objects i.e from > > >> > writes being added). > > >> > > > >> > I'll be ready to play with my new cluster in a bit and shall investigate > > >> > what does actually what. > > >> > > > >> > Christian > > >> > > > >> >> HTH. > > >> >> > > >> >> > > > >> >> > 2. The release notes aren't any particular help there either and issues/PR > > >> >> > talk about forward, not readforward as the culprit. > > >> >> > > > >> >> > 3. What I can gleam from the bits I found, proxy just replaces the forward > > >> >> > functionality. Alas what I'm after is a mode that will not promote reads > > >> >> > to the cache, aka readforward. Or another set of parameters that will > > >> >> > produce the same results. > > >> >> > > > >> >> > Christian > > >> >> > > > >> >> >> > > > >> >> >> > Christian > > >> >> >> > -- > > >> >> >> > Christian Balzer Network/Systems Engineer > > >> >> >> > chibi@xxxxxxx Rakuten Communications > > >> >> >> > _______________________________________________ > > >> >> >> > ceph-users mailing list > > >> >> >> > ceph-users@xxxxxxxxxxxxxx > > >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> > > > >> >> > > > >> >> > -- > > >> >> > Christian Balzer Network/Systems Engineer > > >> >> > chibi@xxxxxxx Rakuten Communications > > >> >> > > >> >> > > >> >> > > >> > > > >> > > > >> > -- > > >> > Christian Balzer Network/Systems Engineer > > >> > chibi@xxxxxxx Rakuten Communications > > >> > > >> > > >> > > > > > > > > > -- > > > Christian Balzer Network/Systems Engineer > > > chibi@xxxxxxx Rakuten Communications > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com