Re: ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

Jason Dillaman <dillaman@xxxxxxxxxx> · Wed, 24 Feb 2016 20:30:26 -0500 (EST)

I'll speak to what I can answer off the top of my head.  The most important point is that this issue is only related to EC pool base tiers, not replicated pools.

> Hello Jason (Ceph devs et al),
> 
> On Wed, 24 Feb 2016 13:15:34 -0500 (EST) Jason Dillaman wrote:
> 
> > If you run "rados -p <cache pool> ls | grep "rbd_id.<yyy-disk1>" and
> > don't see that object, you are experiencing that issue [1].
> > 
> > You can attempt to work around this issue by running "rados -p irfu-virt
> > setomapval rbd_id.<yyy-disk1> dummy value" to force-promote the object
> > to the cache pool.  I haven't tested / verified that will alleviate the
> > issue, though.
> > 
> > [1] http://tracker.ceph.com/issues/14762
> > 
> 
> This concerns me greatly, as I'm about to phase in a cache tier this
> weekend into a very busy, VERY mission critical Ceph cluster.
> That is on top of a replicated pool, Hammer.
> 
> That issue and the related git blurb are less than crystal clear, so for
> my and everybody else's benefit could you elaborate a bit more on this?
> 
> 1. Does this only affect EC base pools?

Correct -- this is only an issue because EC pools do not directly support several operations required by RBD.  Placing a replicated cache tier in front of an EC pool was, in effect, a work-around to this limitation.

> 2. Is this a regressions of sorts and when came it about?
>    I have a hard time imagining people not running into this earlier,
>    unless that problem is very hard to trigger.
> 3. One assumes that this isn't fixed in any released version of Ceph,
>    correct?
> 
> Robert, sorry for CC'ing you, but AFAICT your cluster is about the closest
> approximation in terms of busyness to mine here.
> And I a assume that you're neither using EC pools (since you need
> performance, not space) and haven't experienced this bug all?
> 
> Also, would you consider the benefits of the recency fix (thanks for
> that) being worth risk of being an early adopter of 0.94.6?
> In other words, are you eating your own dog food already and 0.94.6 hasn't
> eaten your data babies yet? ^o^

Per the referenced email chain, it was potentially the recency fix that exposed this issue for EC pools fronted by a cache tier. 

> 
> Regards,
> 
> Christian
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> 

-- 

Jason Dillaman 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com