Re: ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

Nick Fisk <nick@xxxxxxxxxx> · Thu, 25 Feb 2016 13:44:30 -0000

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Jason Dillaman
> Sent: 25 February 2016 01:30
> To: Christian Balzer <chibi@xxxxxxx>
> Cc: ceph-users@xxxxxxxx
> Subject: Re:  ceph hammer : rbd info/Status : operation not
> supported (95) (EC+RBD tier pools)
> 
> I'll speak to what I can answer off the top of my head.  The most
important
> point is that this issue is only related to EC pool base tiers, not
replicated
> pools.
> 
> > Hello Jason (Ceph devs et al),
> >
> > On Wed, 24 Feb 2016 13:15:34 -0500 (EST) Jason Dillaman wrote:
> >
> > > If you run "rados -p <cache pool> ls | grep "rbd_id.<yyy-disk1>" and
> > > don't see that object, you are experiencing that issue [1].
> > >
> > > You can attempt to work around this issue by running "rados -p
> > > irfu-virt setomapval rbd_id.<yyy-disk1> dummy value" to
> > > force-promote the object to the cache pool.  I haven't tested /
> > > verified that will alleviate the issue, though.
> > >
> > > [1] http://tracker.ceph.com/issues/14762
> > >
> >
> > This concerns me greatly, as I'm about to phase in a cache tier this
> > weekend into a very busy, VERY mission critical Ceph cluster.
> > That is on top of a replicated pool, Hammer.
> >
> > That issue and the related git blurb are less than crystal clear, so
> > for my and everybody else's benefit could you elaborate a bit more on
> this?
> >
> > 1. Does this only affect EC base pools?
> 
> Correct -- this is only an issue because EC pools do not directly support
> several operations required by RBD.  Placing a replicated cache tier in
front of
> an EC pool was, in effect, a work-around to this limitation.
> 
> > 2. Is this a regressions of sorts and when came it about?
> >    I have a hard time imagining people not running into this earlier,
> >    unless that problem is very hard to trigger.
> > 3. One assumes that this isn't fixed in any released version of Ceph,
> >    correct?
> >
> > Robert, sorry for CC'ing you, but AFAICT your cluster is about the
> > closest approximation in terms of busyness to mine here.
> > And I a assume that you're neither using EC pools (since you need
> > performance, not space) and haven't experienced this bug all?
> >
> > Also, would you consider the benefits of the recency fix (thanks for
> > that) being worth risk of being an early adopter of 0.94.6?
> > In other words, are you eating your own dog food already and 0.94.6
> > hasn't eaten your data babies yet? ^o^
> 
> Per the referenced email chain, it was potentially the recency fix that
> exposed this issue for EC pools fronted by a cache tier.

Just to add. It's possible this bug was present for a while, but the broken
recency logic effectively always promoted blocks regardless. Once this was
fixed and ceph could actually make a decision of whether a block needed to
be promoted or not this bug surfaced. You can always set the recency to 0
(possibly 1) and have the same behaviour as before the recency fix to ensure
that you won't hit this bug.

> 
> >
> > Regards,
> >
> > Christian
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> >
> 
> --
> 
> Jason Dillaman
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com