Re: Huge memory usage spike in OSD on hammer/giant

Shinobu Kinjo <skinjo@xxxxxxxxxx> · Tue, 8 Sep 2015 08:06:01 -0400 (EDT)

Was that just driver issue?
If so, could we face same kind of issue on different distributed file systems.
I'm just asking.

I'm quite interested in:

 What kind of HBA you are using
 Which version of driver caused the issue

Does any Cepher have any comment on Mariusz's comment?

 Shinobu

----- Original Message -----
From: "Mariusz Gronczewski" <mariusz.gronczewski@xxxxxxxxxxxx>
To: "池信泽" <xmdxcxz@xxxxxxxxx>
Cc: "Shinobu Kinjo" <skinjo@xxxxxxxxxx>, ceph-users@xxxxxxxxxxxxxx
Sent: Tuesday, September 8, 2015 7:09:32 PM
Subject: Re:  Huge memory usage spike in OSD on hammer/giant

For those interested:

Bug that caused ceph to go haywire was a emulex nic driver dropping
packets when making more than few hundred megabits (basically linear
change compared to load) which caused osds to flap constantly once
something gone wrong (high traffic, osd go down, ceph starts to
reallocationg stuff, which causes more traffic, more osds flap, etc)

upgrading kernel to 4.1.6 (was present at least in 4.0.1, and in c6
"distro" kernel) fixed that and it started to rebuild correctly

Lessons learned, buy Intel NICs...

On Mon, 7 Sep 2015 20:51:57 +0800, 池信泽 <xmdxcxz@xxxxxxxxx> wrote:

> Yeh, There is bug which would use huge memory. It be triggered when osd
> down or add into cluster and do recovery/backfilling.
> 
> The patch https://github.com/ceph/ceph/pull/5656
> https://github.com/ceph/ceph/pull/5451 merged into master would fix it, and
> it would be backport.
> 
> I think ceph v0.93 or newer version maybe hit this bug.
> 
> 2015-09-07 20:42 GMT+08:00 Shinobu Kinjo <skinjo@xxxxxxxxxx>:
> 
> > How heavy network traffic was?
> >
> > Have you tried to capture that traffic between cluster and public network
> > to see where such a bunch of traffic came from?
> >
> >  Shinobu
> >
> > ----- Original Message -----
> > From: "Jan Schermer" <jan@xxxxxxxxxxx>
> > To: "Mariusz Gronczewski" <mariusz.gronczewski@xxxxxxxxxxxx>
> > Cc: ceph-users@xxxxxxxxxxxxxx
> > Sent: Monday, September 7, 2015 9:17:04 PM
> > Subject: Re:  Huge memory usage spike in OSD on hammer/giant
> >
> > Hmm, even network traffic went up.
> > Nothing in logs on the mons which started 9/4 ~6 AM?
> >
> > Jan
> >
> > > On 07 Sep 2015, at 14:11, Mariusz Gronczewski <
> > mariusz.gronczewski@xxxxxxxxxxxx> wrote:
> > >
> > > On Mon, 7 Sep 2015 13:44:55 +0200, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> > >
> > >> Maybe some configuration change occured that now takes effect when you
> > start the OSD?
> > >> Not sure what could affect memory usage though - some ulimit values
> > maybe (stack size), number of OSD threads (compare the number from this OSD
> > to the rest of OSDs), fd cache size. Look in /proc and compare everything.
> > >> Also look in "ceph osd tree" - didn't someone touch it while you were
> > gone?
> > >>
> > >> Jan
> > >>
> > >
> > >> number of OSD threads (compare the number from this OSD to the rest of
> > > OSDs),
> > >
> > > it occured on all OSDs, and it looked like that
> > > http://imgur.com/IIMIyRG
> > >
> > > sadly I was on vacation so I didnt manage to catch it before ;/ but I'm
> > > sure there was no config change
> > >
> > >
> > >>> On 07 Sep 2015, at 13:40, Mariusz Gronczewski <
> > mariusz.gronczewski@xxxxxxxxxxxx> wrote:
> > >>>
> > >>> On Mon, 7 Sep 2015 13:02:38 +0200, Jan Schermer <jan@xxxxxxxxxxx>
> > wrote:
> > >>>
> > >>>> Apart from bug causing this, this could be caused by failure of other
> > OSDs (even temporary) that starts backfills.
> > >>>>
> > >>>> 1) something fails
> > >>>> 2) some PGs move to this OSD
> > >>>> 3) this OSD has to allocate memory for all the PGs
> > >>>> 4) whatever fails gets back up
> > >>>> 5) the memory is never released.
> > >>>>
> > >>>> A similiar scenario is possible if for example someone confuses "ceph
> > osd crush reweight" with "ceph osd reweight" (yes, this happened to me :-)).
> > >>>>
> > >>>> Did you try just restarting the OSD before you upgraded it?
> > >>>
> > >>> stopped, upgraded, started. it helped a bit ( <3GB per OSD) but beside
> > >>> that nothing changed. I've tried to wait till it stops eating CPU then
> > >>> restart it but it still eats >2GB of memory which means I can't start
> > >>> all 4 OSDs at same time ;/
> > >>>
> > >>> I've also added noin,nobackfill,norecover flags but that didnt help
> > >>>
> > >>> it is suprising for me because before all 4 OSDs total ate less than
> > >>> 2GBs of memory so I though I have enough headroom, and we did restart
> > >>> machines and removed/added os to test if recovery/rebalance goes fine
> > >>>
> > >>> it also does not have any external traffic at the moment
> > >>>
> > >>>
> > >>>>> On 07 Sep 2015, at 12:58, Mariusz Gronczewski <
> > mariusz.gronczewski@xxxxxxxxxxxx> wrote:
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> over a weekend (was on vacation so I didnt get exactly what happened)
> > >>>>> our OSDs started eating in excess of 6GB of RAM (well RSS), which
> > was a
> > >>>>> problem considering that we had only 8GB of ram for 4 OSDs (about 700
> > >>>>> pgs per osd and about 70GB space used. So spam of coredumps and OOMs
> > >>>>> blocked the osds down to unusabiltity.
> > >>>>>
> > >>>>> I then upgraded one of OSDs to hammer which made it a bit better
> > (~2GB
> > >>>>> per osd) but still much higher usage than before.
> > >>>>>
> > >>>>> any ideas what would be a reason for that ? logs are mostly full on
> > >>>>> OSDs trying to recover and timed out heartbeats
> > >>>>>
> > >>>>> --
> > >>>>> Mariusz Gronczewski, Administrator
> > >>>>>
> > >>>>> Efigence S. A.
> > >>>>> ul. Wołoska 9a, 02-583 Warszawa
> > >>>>> T: [+48] 22 380 13 13
> > >>>>> F: [+48] 22 380 13 14
> > >>>>> E: mariusz.gronczewski@xxxxxxxxxxxx
> > >>>>> <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
> > >>>>> _______________________________________________
> > >>>>> ceph-users mailing list
> > >>>>> ceph-users@xxxxxxxxxxxxxx
> > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Mariusz Gronczewski, Administrator
> > >>>
> > >>> Efigence S. A.
> > >>> ul. Wołoska 9a, 02-583 Warszawa
> > >>> T: [+48] 22 380 13 13
> > >>> F: [+48] 22 380 13 14
> > >>> E: mariusz.gronczewski@xxxxxxxxxxxx
> > >>> <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
> > >>
> > >
> > >
> > >
> > > --
> > > Mariusz Gronczewski, Administrator
> > >
> > > Efigence S. A.
> > > ul. Wołoska 9a, 02-583 Warszawa
> > > T: [+48] 22 380 13 13
> > > F: [+48] 22 380 13 14
> > > E: mariusz.gronczewski@xxxxxxxxxxxx
> > > <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 
> 

-- 
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczewski@xxxxxxxxxxxx
<mailto:mariusz.gronczewski@xxxxxxxxxxxx>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com