With 34 x 4TB OSDs over 4 hosts, I had 30% objects moved - about half full and took around 12 hours. Except now I can't use the kclient any more - wish I'd read that first. On 16 July 2014 13:36, Andrija Panic <andrija.panic at gmail.com> wrote: > For me, 3 nodes, 1MON+ 2x2TB OSDs on each node... no mds used... > I went through pain of waiting for data rebalancing and now I'm on > "optimal" tunables... > Cheers > > > On 16 July 2014 14:29, Andrei Mikhailovsky <andrei at arhont.com> wrote: > >> Quenten, >> >> We've got two monitors sitting on the osd servers and one on a different >> server. >> >> Andrei >> >> -- >> Andrei Mikhailovsky >> Director >> Arhont Information Security >> >> Web: http://www.arhont.com >> http://www.wi-foo.com >> Tel: +44 (0)870 4431337 >> Fax: +44 (0)208 429 3111 >> PGP: Key ID - 0x2B3438DE >> PGP: Server - keyserver.pgp.com >> >> DISCLAIMER >> >> The information contained in this email is intended only for the use of >> the person(s) to whom it is addressed and may be confidential or contain >> legally privileged information. If you are not the intended recipient you >> are hereby notified that any perusal, use, distribution, copying or >> disclosure is strictly prohibited. If you have received this email in error >> please immediately advise us by return email at andrei at arhont.com and >> delete and purge the email and any attachments without making a copy. >> >> >> ------------------------------ >> *From: *"Quenten Grasso" <qgrasso at onq.com.au> >> *To: *"Andrija Panic" <andrija.panic at gmail.com>, "Sage Weil" < >> sweil at redhat.com> >> *Cc: *ceph-users at lists.ceph.com >> *Sent: *Wednesday, 16 July, 2014 1:20:19 PM >> >> *Subject: *Re: [ceph-users] ceph osd crush tunables optimal AND add new >> OSD at the same time >> >> Hi Sage, Andrija & List >> >> >> >> I have seen the tuneables issue on our cluster when I upgraded to firefly. >> >> >> >> I ended up going back to legacy settings after about an hour as my >> cluster is of 55 3TB OSD?s over 5 nodes and it decided it needed to move >> around 32% of our data, which after an hour all of our vm?s were frozen and >> I had to revert the change back to legacy settings and wait about the same >> time again until our cluster had recovered and reboot our vms. (wasn?t >> really expecting that one from the patch notes) >> >> >> >> Also our CPU usage went through the roof as well on our nodes, do you per >> chance have your metadata servers co-located on your osd nodes as we do? >> I?ve been thinking about trying to move these to dedicated nodes as it may >> resolve our issues. >> >> >> >> Regards, >> >> Quenten >> >> >> >> *From:* ceph-users [mailto:ceph-users-bounces at lists.ceph.com] *On Behalf >> Of *Andrija Panic >> *Sent:* Tuesday, 15 July 2014 8:38 PM >> *To:* Sage Weil >> *Cc:* ceph-users at lists.ceph.com >> *Subject:* Re: [ceph-users] ceph osd crush tunables optimal AND add new >> OSD at the same time >> >> >> >> Hi Sage, >> >> >> >> since this problem is tunables-related, do we need to expect same >> behavior or not when we do regular data rebalancing caused by adding >> new/removing OSD? I guess not, but would like your confirmation. >> >> I'm already on optimal tunables, but I'm afraid to test this by i.e. >> shuting down 1 OSD. >> >> >> >> Thanks, >> Andrija >> >> >> >> On 14 July 2014 18:18, Sage Weil <sweil at redhat.com> wrote: >> >> I've added some additional notes/warnings to the upgrade and release >> notes: >> >> >> https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451 >> >> If there is somewhere else where you think a warning flag would be useful, >> let me know! >> >> Generally speaking, we want to be able to cope with huge data rebalances >> without interrupting service. It's an ongoing process of improving the >> recovery vs client prioritization, though, and removing sources of >> overhead related to rebalancing... and it's clearly not perfect yet. :/ >> >> sage >> >> >> >> On Sun, 13 Jul 2014, Andrija Panic wrote: >> >> > Hi, >> > after seting ceph upgrade (0.72.2 to 0.80.3) I have issued "ceph osd >> crush >> > tunables optimal" and after only few minutes I have added 2 more OSDs >> to the >> > CEPH cluster... >> > >> > So these 2 changes were more or a less done at the same time - >> rebalancing >> > because of tunables optimal, and rebalancing because of adding new >> OSD... >> > >> > Result - all VMs living on CEPH storage have gone mad, no disk access >> > efectively, blocked so to speak. >> > >> > Since this rebalancing took 5h-6h, I had bunch of VMs down for that >> long... >> > >> > Did I do wrong by causing "2 rebalancing" to happen at the same time ? >> > Is this behaviour normal, to cause great load on all VMs because they >> are >> > unable to access CEPH storage efectively ? >> > >> > Thanks for any input... >> > -- >> > >> >> > Andrija Pani? >> > >> > >> >> >> >> >> >> -- >> >> >> >> Andrija Pani? >> >> -------------------------------------- >> >> http://admintweets.com >> >> -------------------------------------- >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > > Andrija Pani? > -------------------------------------- > http://admintweets.com > -------------------------------------- > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Mean Trading Systems LLP http://www.meantradingsystems.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140716/5167625b/attachment.htm>