Re: Huge memory usage spike in OSD on hammer/giant

Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> · Mon, 7 Sep 2015 14:11:04 +0200



On Mon, 7 Sep 2015 13:44:55 +0200, Jan Schermer <jan@xxxxxxxxxxx> wrote:

> Maybe some configuration change occured that now takes effect when you start the OSD?
> Not sure what could affect memory usage though - some ulimit values maybe (stack size), number of OSD threads (compare the number from this OSD to the rest of OSDs), fd cache size. Look in /proc and compare everything.
> Also look in "ceph osd tree" - didn't someone touch it while you were gone?
> 
> Jan
> 

> number of OSD threads (compare the number from this OSD to the rest of
OSDs),

it occured on all OSDs, and it looked like that
http://imgur.com/IIMIyRG

sadly I was on vacation so I didnt manage to catch it before ;/ but I'm
sure there was no config change


> > On 07 Sep 2015, at 13:40, Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> wrote:
> > 
> > On Mon, 7 Sep 2015 13:02:38 +0200, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> > 
> >> Apart from bug causing this, this could be caused by failure of other OSDs (even temporary) that starts backfills.
> >> 
> >> 1) something fails
> >> 2) some PGs move to this OSD
> >> 3) this OSD has to allocate memory for all the PGs
> >> 4) whatever fails gets back up
> >> 5) the memory is never released.
> >> 
> >> A similiar scenario is possible if for example someone confuses "ceph osd crush reweight" with "ceph osd reweight" (yes, this happened to me :-)).
> >> 
> >> Did you try just restarting the OSD before you upgraded it?
> > 
> > stopped, upgraded, started. it helped a bit ( <3GB per OSD) but beside
> > that nothing changed. I've tried to wait till it stops eating CPU then
> > restart it but it still eats >2GB of memory which means I can't start
> > all 4 OSDs at same time ;/
> > 
> > I've also added noin,nobackfill,norecover flags but that didnt help
> > 
> > it is suprising for me because before all 4 OSDs total ate less than
> > 2GBs of memory so I though I have enough headroom, and we did restart
> > machines and removed/added os to test if recovery/rebalance goes fine
> > 
> > it also does not have any external traffic at the moment
> > 
> > 
> >>> On 07 Sep 2015, at 12:58, Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> over a weekend (was on vacation so I didnt get exactly what happened)
> >>> our OSDs started eating in excess of 6GB of RAM (well RSS), which was a
> >>> problem considering that we had only 8GB of ram for 4 OSDs (about 700
> >>> pgs per osd and about 70GB space used. So spam of coredumps and OOMs
> >>> blocked the osds down to unusabiltity.
> >>> 
> >>> I then upgraded one of OSDs to hammer which made it a bit better (~2GB
> >>> per osd) but still much higher usage than before.
> >>> 
> >>> any ideas what would be a reason for that ? logs are mostly full on
> >>> OSDs trying to recover and timed out heartbeats
> >>> 
> >>> -- 
> >>> Mariusz Gronczewski, Administrator
> >>> 
> >>> Efigence S. A.
> >>> ul. Wołoska 9a, 02-583 Warszawa
> >>> T: [+48] 22 380 13 13
> >>> F: [+48] 22 380 13 14
> >>> E: mariusz.gronczewski@xxxxxxxxxxxx
> >>> <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@xxxxxxxxxxxxxx
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> 
> > 
> > 
> > 
> > -- 
> > Mariusz Gronczewski, Administrator
> > 
> > Efigence S. A.
> > ul. Wołoska 9a, 02-583 Warszawa
> > T: [+48] 22 380 13 13
> > F: [+48] 22 380 13 14
> > E: mariusz.gronczewski@xxxxxxxxxxxx
> > <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
> 


-- 
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczewski@xxxxxxxxxxxx
<mailto:mariusz.gronczewski@xxxxxxxxxxxx>
Attachment:
pgp66cYRU6PmN.pgp

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com