On Mon, 7 Sep 2015 13:02:38 +0200, Jan Schermer <jan@xxxxxxxxxxx> wrote: > Apart from bug causing this, this could be caused by failure of other OSDs (even temporary) that starts backfills. > > 1) something fails > 2) some PGs move to this OSD > 3) this OSD has to allocate memory for all the PGs > 4) whatever fails gets back up > 5) the memory is never released. > > A similiar scenario is possible if for example someone confuses "ceph osd crush reweight" with "ceph osd reweight" (yes, this happened to me :-)). > > Did you try just restarting the OSD before you upgraded it? stopped, upgraded, started. it helped a bit ( <3GB per OSD) but beside that nothing changed. I've tried to wait till it stops eating CPU then restart it but it still eats >2GB of memory which means I can't start all 4 OSDs at same time ;/ I've also added noin,nobackfill,norecover flags but that didnt help it is suprising for me because before all 4 OSDs total ate less than 2GBs of memory so I though I have enough headroom, and we did restart machines and removed/added os to test if recovery/rebalance goes fine it also does not have any external traffic at the moment > > On 07 Sep 2015, at 12:58, Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> wrote: > > > > Hi, > > > > over a weekend (was on vacation so I didnt get exactly what happened) > > our OSDs started eating in excess of 6GB of RAM (well RSS), which was a > > problem considering that we had only 8GB of ram for 4 OSDs (about 700 > > pgs per osd and about 70GB space used. So spam of coredumps and OOMs > > blocked the osds down to unusabiltity. > > > > I then upgraded one of OSDs to hammer which made it a bit better (~2GB > > per osd) but still much higher usage than before. > > > > any ideas what would be a reason for that ? logs are mostly full on > > OSDs trying to recover and timed out heartbeats > > > > -- > > Mariusz Gronczewski, Administrator > > > > Efigence S. A. > > ul. Wołoska 9a, 02-583 Warszawa > > T: [+48] 22 380 13 13 > > F: [+48] 22 380 13 14 > > E: mariusz.gronczewski@xxxxxxxxxxxx > > <mailto:mariusz.gronczewski@xxxxxxxxxxxx> > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.gronczewski@xxxxxxxxxxxx <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
Attachment:
pgp1liJ1YtZ3w.pgp
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com