That’s a relief, I was sensing a major case of face palm occuring when I read Jason's email!!! > -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel- > owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil > Sent: 11 February 2016 21:00 > To: Jason Dillaman <dillaman@xxxxxxxxxx> > Cc: Nick Fisk <nick@xxxxxxxxxx>; Samuel Just <sjust@xxxxxxxxxx>; ceph- > users@xxxxxxxxxxxxxx; ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: cls_rbd ops on rbd_id.$name objects in EC pool > > I was able to reproduce this on master: > > On Thu, 11 Feb 2016, Jason Dillaman wrote: > > I think I see the problem. It looks like you are performing ops directly > against the cache tier instead of the base tier (assuming cache1 is your cache > pool). Here are my steps against master where the object is successfully > promoted upon 'rbd info': > > > > # ceph osd erasure-code-profile set teuthologyprofile > > ruleset-failure-domain=osd m=1 k=2 > > > > # ceph osd pool delete rbd rbd --yes-i-really-really-mean-it pool > > 'rbd' removed > > > > # ceph osd pool create rbd 4 4 erasure teuthologyprofile pool 'rbd' > > created > > > > # ceph osd pool create cache 4 > > pool 'cache' created > > > > # ceph osd tier add rbd cache > > pool 'cache' is now (or already was) a tier of 'rbd' > > > > # ceph osd tier cache-mode cache writeback set cache-mode for pool > > 'cache' to writeback > > > > # ceph osd tier set-overlay rbd cache > > overlay for 'rbd' is now (or already was) 'cache' > > > > # ceph osd pool set cache hit_set_type bloom set pool 2 hit_set_type > > to bloom > > > > # ceph osd pool set cache hit_set_count 8 set pool 2 hit_set_count to > > 8 > > > > # ceph osd pool set cache hit_set_period 60 set pool 2 hit_set_period > > to 60 > > > > # ceph osd pool set cache target_max_objects 250 set pool 2 > > target_max_objects to 250 > > set pool cache min_read_recency_for_promote 4 > > > # rbd -p rbd create test --size=1M > > > > # for x in {0..10}; do rbd -p rbd info test > /dev/null 2>/dev/null ; > > done > > > > # rados -p cache ls > > rbd_id.test > > test.rbd > > rbd_directory > > rbd_header.101944ba7335 > > > > # rados -p cache cache-flush rbd_id.test > > > > # rados -p cache cache-evict rbd_id.test > > > > # rados -p cache ls > > test.rbd > > rbd_directory > > rbd_header.101944ba7335 > > > > # rbd -p rbd info test > > rbd image 'test': > > size 1024 kB in 1 objects > > order 22 (4096 kB objects) > > block_name_prefix: rbd_data.101944ba7335 > > format: 2 > > features: layering > > flags: > > And then I get EOPNOSUPP too. > > The problem is the get_id op does sync_read, which files. > > I think Nick's suggestion is the right one: if we get EOPNOSUPP we force a > promotion. Not sure how tricky that will be to get right, though. A > workaround for rbd might be to put the info in an xattr instead of in the data > payload.. that's probably more efficient anyway. > > sage > > > > > # rados -p cache ls > > rbd_id.test > > test.rbd > > rbd_directory > > rbd_header.101944ba7335 > > > > -- > > > > Jason Dillaman > > Red Hat Ceph Storage Engineering > > dillaman@xxxxxxxxxx > > http://www.redhat.com > > > > > > ----- Original Message ----- > > > From: "Nick Fisk" <nick@xxxxxxxxxx> > > > To: "Sage Weil" <sweil@xxxxxxxxxx>, "Samuel Just" > <sjust@xxxxxxxxxx> > > > Cc: "Jason Dillaman" <dillaman@xxxxxxxxxx>, > > > ceph-users@xxxxxxxxxxxxxx, ceph-devel@xxxxxxxxxxxxxxx > > > Sent: Thursday, February 11, 2016 12:46:38 PM > > > Subject: RE: cls_rbd ops on rbd_id.$name objects in EC pool > > > > > > Hi Sage, > > > > > > Do you think this will get fixed in time for the Jewel release? It > > > still seems to happen in Master and is definitely related to the recency > setting. > > > I'm guessing that the info command does some sort of read and then a > write. > > > In the old behaviour the read would have always triggered a promotion? > > > > > > > > > nick@Ceph-Test:~$ ceph osd pool get cache1 > > > min_read_recency_for_promote > > > min_read_recency_for_promote: 8 > > > nick@Ceph-Test:~$ ceph osd pool get cache1 > > > min_write_recency_for_promote > > > min_write_recency_for_promote: 8 > > > nick@Ceph-Test:~$ rbd -p cache1 create Test99 --size=10G > > > nick@Ceph-Test:~$ rbd -p cache1 info Test99 rbd image 'Test99': > > > size 10240 MB in 2560 objects > > > order 22 (4096 kB objects) > > > block_name_prefix: rbd_data.e8e734689a5e > > > format: 2 > > > features: layering > > > flags: > > > nick@Ceph-Test:~$ rados -p cache1 cache-flush rbd_id.Test99 > > > nick@Ceph-Test:~$ rados -p cache1 cache-evict rbd_id.Test99 > > > nick@Ceph-Test:~$ rbd -p cache1 info Test99 > > > 2016-02-11 17:39:40.942030 7f0006eb3700 -1 > > > librbd::image::OpenRequest: failed to retrieve image id: (95) > > > Operation not supported > > > 2016-02-11 17:39:40.942205 7f00066b2700 -1 librbd::ImageState: > > > failed to open > > > image: (95) Operation not supported > > > rbd: error opening image Test99: (95) Operation not supported > > > nick@Ceph-Test:~$ ceph osd pool set cache1 > > > min_read_recency_for_promote 0 set pool 12 > > > min_read_recency_for_promote to 0 nick@Ceph-Test:~$ rbd -p cache1 > > > info Test99 rbd image 'Test99': > > > size 10240 MB in 2560 objects > > > order 22 (4096 kB objects) > > > block_name_prefix: rbd_data.e8e734689a5e > > > format: 2 > > > features: layering > > > flags: > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: Nick Fisk [mailto:nick@xxxxxxxxxx] > > > > Sent: 05 February 2016 19:58 > > > > To: 'Sage Weil' <sweil@xxxxxxxxxx>; 'Samuel Just' > > > > <sjust@xxxxxxxxxx> > > > > Cc: 'Jason Dillaman' <dillaman@xxxxxxxxxx>; > > > > ceph-users@xxxxxxxxxxxxxx; ceph-devel@xxxxxxxxxxxxxxx > > > > Subject: RE: cls_rbd ops on rbd_id.$name objects in EC pool > > > > > > > > > -----Original Message----- > > > > > From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel- > > > > > owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil > > > > > Sent: 05 February 2016 18:45 > > > > > To: Samuel Just <sjust@xxxxxxxxxx> > > > > > Cc: Jason Dillaman <dillaman@xxxxxxxxxx>; Nick Fisk > > > > > <nick@xxxxxxxxxx>; ceph-users@xxxxxxxxxxxxxx; > > > > > ceph-devel@xxxxxxxxxxxxxxx > > > > > Subject: Re: cls_rbd ops on rbd_id.$name objects in EC pool > > > > > > > > > > On Fri, 5 Feb 2016, Samuel Just wrote: > > > > > > On Fri, Feb 5, 2016 at 7:53 AM, Jason Dillaman > > > > > > <dillaman@xxxxxxxxxx> > > > > > wrote: > > > > > > > #1 and #2 are awkward for existing pools since we would need > > > > > > > a tool to inject dummy omap values within existing images. > > > > > > > Can the cache tier force-promote it from the EC pool to the > > > > > > > cache when an unsupported op is encountered? There is logic > > > > > > > like that in jewel/master for handling the proxied writes. > > > > > > > > > > That sounded familiar but I couldn't find this in the code or > > > > > history between infernalis and master. And then I went back and > > > > > was unable to reproduce the a problem on either infernalis branch or > v9.2.0. > > > > > > > > > > Nick, I was doing > > > > > 1013 ./rbd -p ec create foo --size 10 > > > > > 1014 ./rbd -p ec info foo > > > > > 1015 ./rados -p ec-cache cache-flush rbd_id.foo > > > > > 1016 ./rados -p ec-cache cache-evict rbd_id.foo > > > > > 1017 ./rbd -p ec info foo > > > > > 1018 ./rados -p ec-cache ls - > > > > > > > > > > The rbd.get_id is successfully forcing a promotion. > > > > > > > > > > Which makes me think something else is going on... Nick, can you > > > > > try to reproduce this with a userspace librbd client? 'rbd > > > > > info' will do a few basic operations, but if that isn't > > > > > problematic, try 'rbd bench-write' or 'rbd export', which will do real > IO? > > > > > > > > Hi Sage, > > > > > > > > Just tried again and I can confirm its definitely not working, but > > > > I think I may have stumbled on the reason why. > > > > > > > > First apologies for not mentioning it before, but I am still > > > > running that recency fix on Infernalis. Initially I thought this > > > > was a flushing issue as I just assumed those objects shouldn't get > > > > flushed out at all. But after reading your email where you said it > > > > forced the promotion, it struck me that the broken recency > > > > behaviour may have been masking this issue. With the fix it would > > > > only promote if the object was hot enough, which it probably in > > > > most cases wouldn't be. As a test I set my recency's down to 0 and > > > > tried the steps above again and this time it worked. Does this > > > > make sense? > > > > > > > > Nick > > > > > > > > > > > > > > sage > > > > > > > > > > > > > > > > -Sam > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Jason Dillaman > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > >> From: "Sage Weil" <sweil@xxxxxxxxxx> > > > > > > >> To: "Nick Fisk" <nick@xxxxxxxxxx> > > > > > > >> Cc: "Jason Dillaman" <dillaman@xxxxxxxxxx>, > > > > > > >> ceph-users@xxxxxxxxxxxxxx, ceph-devel@xxxxxxxxxxxxxxx > > > > > > >> Sent: Friday, February 5, 2016 10:42:17 AM > > > > > > >> Subject: cls_rbd ops on rbd_id.$name objects in EC pool > > > > > > >> > > > > > > >> On Wed, 27 Jan 2016, Nick Fisk wrote: > > > > > > >> > > > > > > > >> > > -----Original Message----- > > > > > > >> > > From: ceph-users > > > > > > >> > > [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] > > > > > > >> > > On Behalf Of Jason Dillaman > > > > > > >> > > Sent: 27 January 2016 14:25 > > > > > > >> > > To: Nick Fisk <nick@xxxxxxxxxx> > > > > > > >> > > Cc: ceph-users@xxxxxxxxxxxxxx > > > > > > >> > > Subject: Re: Possible Cache Tier Bug - Can > > > > > > >> > > someone confirm > > > > > > >> > > > > > > > > >> > > Are you running with an EC pool behind the cache tier? > > > > > > >> > > I know there was an issue with the first Infernalis > > > > > > >> > > release where unsupported ops were being proxied down > > > > > > >> > > to the EC pool, resulting in that same error. > > > > > > >> > > > > > > > >> > Hi Jason, yes I am. 3x Replicated pool on top of an EC pool. > > > > > > >> > > > > > > > >> > It's probably something similar to what you mention. > > > > > > >> > Either the client should be able to access the RBD header > > > > > > >> > object on the base pool, or it should be flagged so that it can't > be evicted. > > > > > > >> > > > > > > >> I just confirmed that the rbd_id.$name object doesn't have > > > > > > >> any omap, so from rados's perspective, flushing and > > > > > > >> evicting it is fine. But yeah, the cls_rbd ops aren't permitted in > the EC pool. > > > > > > >> > > > > > > >> In master/jewel we have a cache-pin function that prevents > > > > > > >> an object from being flushed. > > > > > > >> > > > > > > >> A few options are: > > > > > > >> > > > > > > >> 1) Have cls_rbd cache-pin it's objects. > > > > > > >> > > > > > > >> 2) Have cls_rbd put an omap key on the object to indirectly > > > > > > >> do the > > > > > same. > > > > > > >> > > > > > > >> 3) Add a requires-cls type object flag that keeps hte > > > > > > >> object out of an EC pool *until* it eventually supports cls ops. > > > > > > >> > > > > > > >> I'd lean toward 1 since it's simple and explicit, and when > > > > > > >> we eventually make classes work we can remove the cache-pin > > > > > > >> behavior > > > > > from cls_rbd. > > > > > > >> It's harder to fix in infernalis unless we also backport > > > > > > >> cache-pin/unpin ops, too, so maybe #2 would be a simple > > > > > > >> infernalis > > > > > workaround? > > > > > > >> > > > > > > >> Jason? Sam? > > > > > > >> sage > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > >> > > > > > > > > >> > > -- > > > > > > >> > > > > > > > > >> > > Jason Dillaman > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > ----- Original Message ----- > > > > > > >> > > > From: "Nick Fisk" <nick@xxxxxxxxxx> > > > > > > >> > > > To: ceph-users@xxxxxxxxxxxxxx > > > > > > >> > > > Sent: Wednesday, January 27, 2016 8:46:53 AM > > > > > > >> > > > Subject: Possible Cache Tier Bug - Can > > > > > > >> > > > someone confirm > > > > > > >> > > > > > > > > > >> > > > Hi All, > > > > > > >> > > > > > > > > > >> > > > I think I have stumbled on a bug. I'm running > > > > > > >> > > > Infernalis (Kernel 4.4 on the > > > > > > >> > > > client) and it seems that if the RBD header object > > > > > > >> > > > gets evicted from the cache pool then you can no longer > map it. > > > > > > >> > > > > > > > > > >> > > > Steps to reproduce > > > > > > >> > > > > > > > > > >> > > > rbd -p cache1 create Test --size=10G rbd - p cache1 > > > > > > >> > > > map Test > > > > > > >> > > > > > > > > > >> > > > /dev/rbd1 <-Works!! > > > > > > >> > > > > > > > > > >> > > > rbd unmap /dev/rbd1 > > > > > > >> > > > > > > > > > >> > > > rados -p cache1 cache-flush rbd_id.Test rados -p > > > > > > >> > > > cache1 cache-evict rbd_id.Test rbd - p cache1 map > > > > > > >> > > > Test > > > > > > >> > > > > > > > > > >> > > > rbd: sysfs write failed > > > > > > >> > > > rbd: map failed: (95) Operation not supported > > > > > > >> > > > > > > > > > >> > > > or with the rbd-nbd client > > > > > > >> > > > > > > > > > >> > > > 2016-01-27 13:39:52.686770 7f9e54162b00 -1 > > > > > > >> > > > asok(0x561837b88360) > > > > > > >> > > > AdminSocketConfigObs::init: failed: > > > > > AdminSocket::bind_and_listen: > > > > > > >> > > > failed to bind the UNIX domain socket to > > > > > > >> > > > '/var/run/ceph/ceph-client.admin.asok': (17) File > > > > > > >> > > > exists > > > > > > >> > > > 2016-01-27 13:39:52.703987 7f9e32ffd700 -1 > > > > > librbd::image::OpenRequest: > > > > > > >> > > > failed to retrieve image id: (95) Operation not > > > > > > >> > > > supported > > > > > > >> > > > rbd-nbd: failed to map, status: (95) Operation not > > > > > > >> > > > supported > > > > > > >> > > > 2016-01-27 13:39:52.704138 7f9e327fc700 -1 > > > > > > >> > > > librbd::ImageState: failed to open image: (95) > > > > > > >> > > > Operation not supported > > > > > > >> > > > > > > > > > >> > > > Nick > > > > > > >> > > > > > > > > > >> > > > > _______________________________________________ > > > > > > >> > > > ceph-users mailing list ceph-users@xxxxxxxxxxxxxx > > > > > > >> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.co > > > > > > >> > > > m > > > > > > >> > > > > > > > > > >> > > > _______________________________________________ > > > > > > >> > > ceph-users mailing list ceph-users@xxxxxxxxxxxxxx > > > > > > >> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > >> > > > > > > > >> > _______________________________________________ > > > > > > >> > ceph-users mailing list > > > > > > >> > ceph-users@xxxxxxxxxxxxxx > > > > > > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > >> > > > > > > > >> > > > > > > > >> -- > > > > > > >> To unsubscribe from this list: send the line "unsubscribe > > > > > > >> ceph-devel" in the body of a message to > > > > > > >> majordomo@xxxxxxxxxxxxxxx More majordomo info at > > > > > > >> http://vger.kernel.org/majordomo-info.html > > > > > > >> > > > > > > > -- > > > > > > > To unsubscribe from this list: send the line "unsubscribe > > > > > > > ceph-devel" in the body of a message to > > > > > > > majordomo@xxxxxxxxxxxxxxx More majordomo info at > > > > > > > http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > > > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More > > > > majordomo > > > > > info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More > majordomo > > info at http://vger.kernel.org/majordomo-info.html > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the > body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at > http://vger.kernel.org/majordomo-info.html _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com