Hmm, even network traffic went up. Nothing in logs on the mons which started 9/4 ~6 AM? Jan > On 07 Sep 2015, at 14:11, Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> wrote: > > On Mon, 7 Sep 2015 13:44:55 +0200, Jan Schermer <jan@xxxxxxxxxxx> wrote: > >> Maybe some configuration change occured that now takes effect when you start the OSD? >> Not sure what could affect memory usage though - some ulimit values maybe (stack size), number of OSD threads (compare the number from this OSD to the rest of OSDs), fd cache size. Look in /proc and compare everything. >> Also look in "ceph osd tree" - didn't someone touch it while you were gone? >> >> Jan >> > >> number of OSD threads (compare the number from this OSD to the rest of > OSDs), > > it occured on all OSDs, and it looked like that > http://imgur.com/IIMIyRG > > sadly I was on vacation so I didnt manage to catch it before ;/ but I'm > sure there was no config change > > >>> On 07 Sep 2015, at 13:40, Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> wrote: >>> >>> On Mon, 7 Sep 2015 13:02:38 +0200, Jan Schermer <jan@xxxxxxxxxxx> wrote: >>> >>>> Apart from bug causing this, this could be caused by failure of other OSDs (even temporary) that starts backfills. >>>> >>>> 1) something fails >>>> 2) some PGs move to this OSD >>>> 3) this OSD has to allocate memory for all the PGs >>>> 4) whatever fails gets back up >>>> 5) the memory is never released. >>>> >>>> A similiar scenario is possible if for example someone confuses "ceph osd crush reweight" with "ceph osd reweight" (yes, this happened to me :-)). >>>> >>>> Did you try just restarting the OSD before you upgraded it? >>> >>> stopped, upgraded, started. it helped a bit ( <3GB per OSD) but beside >>> that nothing changed. I've tried to wait till it stops eating CPU then >>> restart it but it still eats >2GB of memory which means I can't start >>> all 4 OSDs at same time ;/ >>> >>> I've also added noin,nobackfill,norecover flags but that didnt help >>> >>> it is suprising for me because before all 4 OSDs total ate less than >>> 2GBs of memory so I though I have enough headroom, and we did restart >>> machines and removed/added os to test if recovery/rebalance goes fine >>> >>> it also does not have any external traffic at the moment >>> >>> >>>>> On 07 Sep 2015, at 12:58, Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> wrote: >>>>> >>>>> Hi, >>>>> >>>>> over a weekend (was on vacation so I didnt get exactly what happened) >>>>> our OSDs started eating in excess of 6GB of RAM (well RSS), which was a >>>>> problem considering that we had only 8GB of ram for 4 OSDs (about 700 >>>>> pgs per osd and about 70GB space used. So spam of coredumps and OOMs >>>>> blocked the osds down to unusabiltity. >>>>> >>>>> I then upgraded one of OSDs to hammer which made it a bit better (~2GB >>>>> per osd) but still much higher usage than before. >>>>> >>>>> any ideas what would be a reason for that ? logs are mostly full on >>>>> OSDs trying to recover and timed out heartbeats >>>>> >>>>> -- >>>>> Mariusz Gronczewski, Administrator >>>>> >>>>> Efigence S. A. >>>>> ul. Wołoska 9a, 02-583 Warszawa >>>>> T: [+48] 22 380 13 13 >>>>> F: [+48] 22 380 13 14 >>>>> E: mariusz.gronczewski@xxxxxxxxxxxx >>>>> <mailto:mariusz.gronczewski@xxxxxxxxxxxx> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >>> >>> >>> -- >>> Mariusz Gronczewski, Administrator >>> >>> Efigence S. A. >>> ul. Wołoska 9a, 02-583 Warszawa >>> T: [+48] 22 380 13 13 >>> F: [+48] 22 380 13 14 >>> E: mariusz.gronczewski@xxxxxxxxxxxx >>> <mailto:mariusz.gronczewski@xxxxxxxxxxxx> >> > > > > -- > Mariusz Gronczewski, Administrator > > Efigence S. A. > ul. Wołoska 9a, 02-583 Warszawa > T: [+48] 22 380 13 13 > F: [+48] 22 380 13 14 > E: mariusz.gronczewski@xxxxxxxxxxxx > <mailto:mariusz.gronczewski@xxxxxxxxxxxx> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com