I'll speak to what I can answer off the top of my head. The most important point is that this issue is only related to EC pool base tiers, not replicated pools. > Hello Jason (Ceph devs et al), > > On Wed, 24 Feb 2016 13:15:34 -0500 (EST) Jason Dillaman wrote: > > > If you run "rados -p <cache pool> ls | grep "rbd_id.<yyy-disk1>" and > > don't see that object, you are experiencing that issue [1]. > > > > You can attempt to work around this issue by running "rados -p irfu-virt > > setomapval rbd_id.<yyy-disk1> dummy value" to force-promote the object > > to the cache pool. I haven't tested / verified that will alleviate the > > issue, though. > > > > [1] http://tracker.ceph.com/issues/14762 > > > > This concerns me greatly, as I'm about to phase in a cache tier this > weekend into a very busy, VERY mission critical Ceph cluster. > That is on top of a replicated pool, Hammer. > > That issue and the related git blurb are less than crystal clear, so for > my and everybody else's benefit could you elaborate a bit more on this? > > 1. Does this only affect EC base pools? Correct -- this is only an issue because EC pools do not directly support several operations required by RBD. Placing a replicated cache tier in front of an EC pool was, in effect, a work-around to this limitation. > 2. Is this a regressions of sorts and when came it about? > I have a hard time imagining people not running into this earlier, > unless that problem is very hard to trigger. > 3. One assumes that this isn't fixed in any released version of Ceph, > correct? > > Robert, sorry for CC'ing you, but AFAICT your cluster is about the closest > approximation in terms of busyness to mine here. > And I a assume that you're neither using EC pools (since you need > performance, not space) and haven't experienced this bug all? > > Also, would you consider the benefits of the recency fix (thanks for > that) being worth risk of being an early adopter of 0.94.6? > In other words, are you eating your own dog food already and 0.94.6 hasn't > eaten your data babies yet? ^o^ Per the referenced email chain, it was potentially the recency fix that exposed this issue for EC pools fronted by a cache tier. > > Regards, > > Christian > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > -- Jason Dillaman _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com