Re: Huge memory usage spike in OSD on hammer/giant

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 7 Sep 2015 13:02:38 +0200, Jan Schermer <jan@xxxxxxxxxxx> wrote:

> Apart from bug causing this, this could be caused by failure of other OSDs (even temporary) that starts backfills.
> 
> 1) something fails
> 2) some PGs move to this OSD
> 3) this OSD has to allocate memory for all the PGs
> 4) whatever fails gets back up
> 5) the memory is never released.
> 
> A similiar scenario is possible if for example someone confuses "ceph osd crush reweight" with "ceph osd reweight" (yes, this happened to me :-)).
> 
> Did you try just restarting the OSD before you upgraded it?

stopped, upgraded, started. it helped a bit ( <3GB per OSD) but beside
that nothing changed. I've tried to wait till it stops eating CPU then
restart it but it still eats >2GB of memory which means I can't start
all 4 OSDs at same time ;/

I've also added noin,nobackfill,norecover flags but that didnt help

it is suprising for me because before all 4 OSDs total ate less than
2GBs of memory so I though I have enough headroom, and we did restart
machines and removed/added os to test if recovery/rebalance goes fine

it also does not have any external traffic at the moment

 
> > On 07 Sep 2015, at 12:58, Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> wrote:
> > 
> > Hi,
> > 
> > over a weekend (was on vacation so I didnt get exactly what happened)
> > our OSDs started eating in excess of 6GB of RAM (well RSS), which was a
> > problem considering that we had only 8GB of ram for 4 OSDs (about 700
> > pgs per osd and about 70GB space used. So spam of coredumps and OOMs
> > blocked the osds down to unusabiltity.
> > 
> > I then upgraded one of OSDs to hammer which made it a bit better (~2GB
> > per osd) but still much higher usage than before.
> > 
> > any ideas what would be a reason for that ? logs are mostly full on
> > OSDs trying to recover and timed out heartbeats
> > 
> > -- 
> > Mariusz Gronczewski, Administrator
> > 
> > Efigence S. A.
> > ul. Wołoska 9a, 02-583 Warszawa
> > T: [+48] 22 380 13 13
> > F: [+48] 22 380 13 14
> > E: mariusz.gronczewski@xxxxxxxxxxxx
> > <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 



-- 
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczewski@xxxxxxxxxxxx
<mailto:mariusz.gronczewski@xxxxxxxxxxxx>

Attachment: pgp1liJ1YtZ3w.pgp
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux