Re: Huge memory usage spike in OSD on hammer/giant

Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> · Fri, 11 Sep 2015 11:42:39 +0200

Well, if you plan for OSD to have 2GB per daemon and suddenly it eats
4x as much RAM you might get cluster to a unrecoverable state if you
can't just increase amount of RAM at will. I managed to recover it
because I had only 4 OSDs per machine but I cant imagine what would
happen on 36 OSD machine...

Of course, failure on my cluster was extreme case (flapping caused by
NIC driver basically made most of PGs be on "wrong" OSDs causing
excessige memory usage) but if one bad driver can get cluster to such
hard to recover state it is pretty bad

And swap wont help in that case as it only will make daemons to
time-out.

It would be preferable to have slower recovery and not have kernel
OOM-killing processes every few minutes.

On Wed, 9 Sep 2015 16:36:44 +0200, Jan Schermer
<jan@xxxxxxxxxxx> wrote:

> You can sort of simulate it:
> 
> >>>> * E.g. if you do something silly like "ceph osd crush reweight osd.1 10000" you will see the RSS of osd.28 skyrocket. Reweighting it back down will not release the memory until you do "heap release".
> 
> But this is expected, methinks.
> 
> Jan
> 
> 
> > On 09 Sep 2015, at 15:51, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
> > 
> > Yes, under no circumstances is it really ok for an OSD to consume 8GB of RSS! :)  It'd be really swell if we could replicate that kind of memory growth in-house on demand.
> > 
> > Mark
> > 
> > On 09/09/2015 05:56 AM, Jan Schermer wrote:
> >> Sorry if I wasn't clear.
> >> Going from 2GB to 8GB is not normal, although some slight bloating is expected. In your case it just got much worse than usual for reasons yet unknown.
> >> 
> >> Jan
> >> 
> >> 
> >>> On 09 Sep 2015, at 12:40, Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> wrote:
> >>> 
> >>> 
> >>> well I was going by
> >>> http://ceph.com/docs/master/start/hardware-recommendations/ and planning for 2GB per OSD so that was a suprise.... maybe there should be warning somewhere ?
> >>> 
> >>> 
> >>> On Wed, 9 Sep 2015 12:21:15 +0200, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> >>> 
> >>>> The memory gets used for additional PGs on the OSD.
> >>>> If you were to "swap" PGs between two OSDs, you'll get memory wasted on both of them because tcmalloc doesn't release it.*
> >>>> It usually gets stable after few days even during backfills, so it does get reused if needed.
> >>>> If for some reason your OSDs get to 8GB RSS then I recommend you just get more memory, or try disabling tcmalloc which can either help or make it even worse :-)
> >>>> 
> >>>> * E.g. if you do something silly like "ceph osd crush reweight osd.1 10000" you will see the RSS of osd.28 skyrocket. Reweighting it back down will not release the memory until you do "heap release".
> >>>> 
> >>>> Jan
> >>>> 
> >>>> 
> >>>>> On 09 Sep 2015, at 12:05, Mariusz Gronczewski <mariusz.gronczewski@xxxxxxxxxxxx> wrote:
> >>>>> 
> >>>>> On Tue, 08 Sep 2015 16:14:15 -0500, Chad William Seys
> >>>>> <cwseys@xxxxxxxxxxxxxxxx> wrote:
> >>>>> 
> >>>>>> Does 'ceph tell osd.* heap release' help with OSD RAM usage?
> >>>>>> 
> >>>>>> From
> >>>>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-August/003932.html
> >>>>>> 
> >>>>>> Chad.
> >>>>> 
> >>>>> it did help now, but cluster is in clean state at the moment. But I
> >>>>> didnt know that one, thanks.
> >>>>> 
> >>>>> High memory usage stopped once cluster rebuilt, but I've planned
> >>>>> cluster to have 2GB per OSD so I needed to add ram to even get to the
> >>>>> point of ceph starting to rebuild, as some OSD ate up to 8 GBs during
> >>>>> recover
> >>>>> 
> >>>>> --
> >>>>> Mariusz Gronczewski, Administrator
> >>>>> 
> >>>>> Efigence S. A.
> >>>>> ul. Wołoska 9a, 02-583 Warszawa
> >>>>> T: [+48] 22 380 13 13
> >>>>> F: [+48] 22 380 13 14
> >>>>> E: mariusz.gronczewski@xxxxxxxxxxxx
> >>>>> <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>> 
> >>> 
> >>> 
> >>> 
> >>> --
> >>> Mariusz Gronczewski, Administrator
> >>> 
> >>> Efigence S. A.
> >>> ul. Wołoska 9a, 02-583 Warszawa
> >>> T: [+48] 22 380 13 13
> >>> F: [+48] 22 380 13 14
> >>> E: mariusz.gronczewski@xxxxxxxxxxxx
> >>> <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
> >> 
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczewski@xxxxxxxxxxxx
<mailto:mariusz.gronczewski@xxxxxxxxxxxx>
Attachment:
pgp9gV_DljRwH.pgp

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com