Re: ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

Christian Balzer <chibi@xxxxxxx> · Fri, 26 Feb 2016 13:09:28 +0900

Hello Robert,

Thanks for the speedy reply. 

On Wed, 24 Feb 2016 22:44:47 -0700 Robert LeBlanc wrote:

> We have not seen this issue, but we don't run EC pools yet (we are
> waiting for multiple layers to be available). 

Yeah, that seems to be the consensus here, only EC is affected.

> We are not running 0.94.6
> in production yet either. We have adopted the policy to only run released
> versions in production unless there is a really pressing need to have a
> patch. 

Well, it is released since Wednesday. ^o^

But then again we don't update things here either unless we're hitting a
bug.
If not for the need of a working cache tier, we'd still be on Firefly.

>We are running 0.94.6 through our alpha and staging clusters and
> hoping to do the upgrade in the next couple of weeks. We won't know how
> much the recency fix will help until then because we have not been able
> to replicate our workload with fio accurately enough to get good test
> results. 

I had/have high hopes with working recency, as it will avoid getting the
cache filled with cold objects, in turn it having to evict stuff and
pounding the base pool again.
Alas I just found that write recency isn't supported with Hammer, see
another mail soon.

> Unfortunately we will probably be swapping out our M600s with
> S3610s. We've burned through 30% of the life in 2 months and they have
> 8x the op latency. 
Ouch, that's quite the wear-out. 
Aside from the SSDs having insufficient endurance, what level of
write-amplification are you seeing on average?

As for the 3610s, make sure to update their firmware before deployment. 

> Due to the 10 Minutes of Terror, we are going to have
> to do both at the same time to reduce the impact. Luckily, when you have
> weighted out OSDs or empty ones, it is much less impactful. If you get
> your upgrade done before ours, I'd like to know how it went. I'll be
> posting the results from ours when it is done.
> 
I think I'll pass on 0.94.6 for the moment, as I seem to have found
another bug, more on that later if I can confirm it.
Right now I'm reboot my entire test cluster to make sure this isn't a
residual effect from doing multiple upgrades w/o ever rebooting nodes.

Christian

> Sent from a mobile device, please excuse any typos.
> On Feb 24, 2016 5:43 PM, "Christian Balzer" <chibi@xxxxxxx> wrote:
> 
> >
> > Hello Jason (Ceph devs et al),
> >
> > On Wed, 24 Feb 2016 13:15:34 -0500 (EST) Jason Dillaman wrote:
> >
> > > If you run "rados -p <cache pool> ls | grep "rbd_id.<yyy-disk1>" and
> > > don't see that object, you are experiencing that issue [1].
> > >
> > > You can attempt to work around this issue by running "rados -p
> > > irfu-virt setomapval rbd_id.<yyy-disk1> dummy value" to
> > > force-promote the object to the cache pool.  I haven't tested /
> > > verified that will alleviate the issue, though.
> > >
> > > [1] http://tracker.ceph.com/issues/14762
> > >
> >
> > This concerns me greatly, as I'm about to phase in a cache tier this
> > weekend into a very busy, VERY mission critical Ceph cluster.
> > That is on top of a replicated pool, Hammer.
> >
> > That issue and the related git blurb are less than crystal clear, so
> > for my and everybody else's benefit could you elaborate a bit more on
> > this?
> >
> > 1. Does this only affect EC base pools?
> > 2. Is this a regressions of sorts and when came it about?
> >    I have a hard time imagining people not running into this earlier,
> >    unless that problem is very hard to trigger.
> > 3. One assumes that this isn't fixed in any released version of Ceph,
> >    correct?
> >
> > Robert, sorry for CC'ing you, but AFAICT your cluster is about the
> > closest approximation in terms of busyness to mine here.
> > And I a assume that you're neither using EC pools (since you need
> > performance, not space) and haven't experienced this bug all?
> >
> > Also, would you consider the benefits of the recency fix (thanks for
> > that) being worth risk of being an early adopter of 0.94.6?
> > In other words, are you eating your own dog food already and 0.94.6
> > hasn't eaten your data babies yet? ^o^
> >
> > Regards,
> >
> > Christian
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> >

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com