ceph osd crush tunables optimal AND add new OSD at the same time

andrija.panic@xxxxxxxxx (Andrija Panic) · Wed, 16 Jul 2014 14:36:16 +0200

For me, 3 nodes, 1MON+ 2x2TB OSDs on each node... no mds used...
I went through pain of waiting for data rebalancing and now I'm on
"optimal" tunables...
Cheers

On 16 July 2014 14:29, Andrei Mikhailovsky <andrei at arhont.com> wrote:

> Quenten,
>
> We've got two monitors sitting on the osd servers and one on a different
> server.
>
> Andrei
>
> --
> Andrei Mikhailovsky
> Director
> Arhont Information Security
>
> Web: http://www.arhont.com
> http://www.wi-foo.com
> Tel: +44 (0)870 4431337
> Fax: +44 (0)208 429 3111
> PGP: Key ID - 0x2B3438DE
> PGP: Server - keyserver.pgp.com
>
> DISCLAIMER
>
> The information contained in this email is intended only for the use of
> the person(s) to whom it is addressed and may be confidential or contain
> legally privileged information. If you are not the intended recipient you
> are hereby notified that any perusal, use, distribution, copying or
> disclosure is strictly prohibited. If you have received this email in error
> please immediately advise us by return email at andrei at arhont.com and
> delete and purge the email and any attachments without making a copy.
>
>
> ------------------------------
> *From: *"Quenten Grasso" <qgrasso at onq.com.au>
> *To: *"Andrija Panic" <andrija.panic at gmail.com>, "Sage Weil" <
> sweil at redhat.com>
> *Cc: *ceph-users at lists.ceph.com
> *Sent: *Wednesday, 16 July, 2014 1:20:19 PM
>
> *Subject: *Re: [ceph-users] ceph osd crush tunables optimal AND add new
> OSD at the same time
>
> Hi Sage, Andrija & List
>
>
>
> I have seen the tuneables issue on our cluster when I upgraded to firefly.
>
>
>
> I ended up going back to legacy settings after about an hour as my cluster
> is of 55 3TB OSD?s over 5 nodes and it decided it needed to move around 32%
> of our data, which after an hour all of our vm?s were frozen and I had to
> revert the change back to legacy settings and wait about the same time
> again until our cluster had recovered and reboot our vms. (wasn?t really
> expecting that one from the patch notes)
>
>
>
> Also our CPU usage went through the roof as well on our nodes, do you per
> chance have your metadata servers co-located on your osd nodes as we do?
>  I?ve been thinking about trying to move these to dedicated nodes as it may
> resolve our issues.
>
>
>
> Regards,
>
> Quenten
>
>
>
> *From:* ceph-users [mailto:ceph-users-bounces at lists.ceph.com] *On Behalf
> Of *Andrija Panic
> *Sent:* Tuesday, 15 July 2014 8:38 PM
> *To:* Sage Weil
> *Cc:* ceph-users at lists.ceph.com
> *Subject:* Re: [ceph-users] ceph osd crush tunables optimal AND add new
> OSD at the same time
>
>
>
> Hi Sage,
>
>
>
> since this problem is tunables-related, do we need to expect same behavior
> or not  when we do regular data rebalancing caused by adding new/removing
> OSD? I guess not, but would like your confirmation.
>
> I'm already on optimal tunables, but I'm afraid to test this by i.e.
> shuting down 1 OSD.
>
>
>
> Thanks,
> Andrija
>
>
>
> On 14 July 2014 18:18, Sage Weil <sweil at redhat.com> wrote:
>
> I've added some additional notes/warnings to the upgrade and release
> notes:
>
>
> https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451
>
> If there is somewhere else where you think a warning flag would be useful,
> let me know!
>
> Generally speaking, we want to be able to cope with huge data rebalances
> without interrupting service.  It's an ongoing process of improving the
> recovery vs client prioritization, though, and removing sources of
> overhead related to rebalancing... and it's clearly not perfect yet. :/
>
> sage
>
>
>
> On Sun, 13 Jul 2014, Andrija Panic wrote:
>
> > Hi,
> > after seting ceph upgrade (0.72.2 to 0.80.3) I have issued "ceph osd
> crush
> > tunables optimal" and after only few minutes I have added 2 more OSDs to
> the
> > CEPH cluster...
> >
> > So these 2 changes were more or a less done at the same time -
> rebalancing
> > because of tunables optimal, and rebalancing because of adding new OSD...
> >
> > Result - all VMs living on CEPH storage have gone mad, no disk access
> > efectively, blocked so to speak.
> >
> > Since this rebalancing took 5h-6h, I had bunch of VMs down for that
> long...
> >
> > Did I do wrong by causing "2 rebalancing" to happen at the same time ?
> > Is this behaviour normal, to cause great load on all VMs because they are
> > unable to access CEPH storage efectively ?
> >
> > Thanks for any input...
> > --
> >
>
> > Andrija Pani?
> >
> >
>
>
>
>
>
> --
>
>
>
> Andrija Pani?
>
> --------------------------------------
>
>   http://admintweets.com
>
> --------------------------------------
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

-- 

Andrija Pani?
--------------------------------------
  http://admintweets.com
--------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140716/cde3a4bc/attachment.htm>