For me, 3 nodes, 1MON+ 2x2TB OSDs on each node... no mds used... I went through pain of waiting for data rebalancing and now I'm on "optimal" tunables... Cheers On 16 July 2014 14:29, Andrei Mikhailovsky <andrei at arhont.com> wrote: > Quenten, > > We've got two monitors sitting on the osd servers and one on a different > server. > > Andrei > > -- > Andrei Mikhailovsky > Director > Arhont Information Security > > Web: http://www.arhont.com > http://www.wi-foo.com > Tel: +44 (0)870 4431337 > Fax: +44 (0)208 429 3111 > PGP: Key ID - 0x2B3438DE > PGP: Server - keyserver.pgp.com > > DISCLAIMER > > The information contained in this email is intended only for the use of > the person(s) to whom it is addressed and may be confidential or contain > legally privileged information. If you are not the intended recipient you > are hereby notified that any perusal, use, distribution, copying or > disclosure is strictly prohibited. If you have received this email in error > please immediately advise us by return email at andrei at arhont.com and > delete and purge the email and any attachments without making a copy. > > > ------------------------------ > *From: *"Quenten Grasso" <qgrasso at onq.com.au> > *To: *"Andrija Panic" <andrija.panic at gmail.com>, "Sage Weil" < > sweil at redhat.com> > *Cc: *ceph-users at lists.ceph.com > *Sent: *Wednesday, 16 July, 2014 1:20:19 PM > > *Subject: *Re: [ceph-users] ceph osd crush tunables optimal AND add new > OSD at the same time > > Hi Sage, Andrija & List > > > > I have seen the tuneables issue on our cluster when I upgraded to firefly. > > > > I ended up going back to legacy settings after about an hour as my cluster > is of 55 3TB OSD?s over 5 nodes and it decided it needed to move around 32% > of our data, which after an hour all of our vm?s were frozen and I had to > revert the change back to legacy settings and wait about the same time > again until our cluster had recovered and reboot our vms. (wasn?t really > expecting that one from the patch notes) > > > > Also our CPU usage went through the roof as well on our nodes, do you per > chance have your metadata servers co-located on your osd nodes as we do? > I?ve been thinking about trying to move these to dedicated nodes as it may > resolve our issues. > > > > Regards, > > Quenten > > > > *From:* ceph-users [mailto:ceph-users-bounces at lists.ceph.com] *On Behalf > Of *Andrija Panic > *Sent:* Tuesday, 15 July 2014 8:38 PM > *To:* Sage Weil > *Cc:* ceph-users at lists.ceph.com > *Subject:* Re: [ceph-users] ceph osd crush tunables optimal AND add new > OSD at the same time > > > > Hi Sage, > > > > since this problem is tunables-related, do we need to expect same behavior > or not when we do regular data rebalancing caused by adding new/removing > OSD? I guess not, but would like your confirmation. > > I'm already on optimal tunables, but I'm afraid to test this by i.e. > shuting down 1 OSD. > > > > Thanks, > Andrija > > > > On 14 July 2014 18:18, Sage Weil <sweil at redhat.com> wrote: > > I've added some additional notes/warnings to the upgrade and release > notes: > > > https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451 > > If there is somewhere else where you think a warning flag would be useful, > let me know! > > Generally speaking, we want to be able to cope with huge data rebalances > without interrupting service. It's an ongoing process of improving the > recovery vs client prioritization, though, and removing sources of > overhead related to rebalancing... and it's clearly not perfect yet. :/ > > sage > > > > On Sun, 13 Jul 2014, Andrija Panic wrote: > > > Hi, > > after seting ceph upgrade (0.72.2 to 0.80.3) I have issued "ceph osd > crush > > tunables optimal" and after only few minutes I have added 2 more OSDs to > the > > CEPH cluster... > > > > So these 2 changes were more or a less done at the same time - > rebalancing > > because of tunables optimal, and rebalancing because of adding new OSD... > > > > Result - all VMs living on CEPH storage have gone mad, no disk access > > efectively, blocked so to speak. > > > > Since this rebalancing took 5h-6h, I had bunch of VMs down for that > long... > > > > Did I do wrong by causing "2 rebalancing" to happen at the same time ? > > Is this behaviour normal, to cause great load on all VMs because they are > > unable to access CEPH storage efectively ? > > > > Thanks for any input... > > -- > > > > > Andrija Pani? > > > > > > > > > > -- > > > > Andrija Pani? > > -------------------------------------- > > http://admintweets.com > > -------------------------------------- > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Andrija Pani? -------------------------------------- http://admintweets.com -------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140716/cde3a4bc/attachment.htm>