Re: Unexpected increase in the memory usage of OSDs

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 9 Oct 2019 09:51:12 -0700

On Mon, Oct 7, 2019 at 7:20 AM Vladimir Brik
<vladimir.brik@xxxxxxxxxxxxxxxx> wrote:
>
>  > Do you have statistics on the size of the OSDMaps or count of them
>  > which were being maintained by the OSDs?
> No, I don't think so. How can I find this information?

Hmm I don't know if we directly expose the size of maps. There are
perfcounters which expose the range of maps being kept around but I
don't know their names off-hand.

Maybe it's something else involving the bluestore cache or whatever;
if you're not using the newer memory limits I'd switch to those but
otherwise I dunno.
-Greg

>
> Memory consumption started to climb again:
> https://icecube.wisc.edu/~vbrik/graph-3.png
>
> Some more info (not sure if relevant or not):
>
> I increased size of the swap on the servers to 10GB and it's being
> completely utilized, even though there is still quite a bit of free memory.
>
> It appears that memory is highly fragmented on the NUMA node 0 of all
> the servers. Some of the servers have no free pages higher than order 0.
> (Memory on NUMA node 1 of the servers appears much less fragmented.)
>
> The servers have 192GB of RAM, 2 NUMA nodes.
>
>
> Vlad
>
>
>
> On 10/4/19 6:09 PM, Gregory Farnum wrote:
> > Do you have statistics on the size of the OSDMaps or count of them
> > which were being maintained by the OSDs? I'm not sure why having noout
> > set would change that if all the nodes were alive, but that's my bet.
> > -Greg
> >
> > On Thu, Oct 3, 2019 at 7:04 AM Vladimir Brik
> > <vladimir.brik@xxxxxxxxxxxxxxxx> wrote:
> >>
> >> And, just as unexpectedly, things have returned to normal overnight
> >> https://icecube.wisc.edu/~vbrik/graph-1.png
> >>
> >> The change seems to have coincided with the beginning of Rados Gateway
> >> activity (before, it was essentially zero). I can see nothing in the
> >> logs that would explain what happened though.
> >>
> >> Vlad
> >>
> >>
> >>
> >> On 10/2/19 3:43 PM, Vladimir Brik wrote:
> >>> Hello
> >>>
> >>> I am running a Ceph 14.2.2 cluster and a few days ago, memory
> >>> consumption of our OSDs started to unexpectedly grow on all 5 nodes,
> >>> after being stable for about 6 months.
> >>>
> >>> Node memory consumption: https://icecube.wisc.edu/~vbrik/graph.png
> >>> Average OSD resident size: https://icecube.wisc.edu/~vbrik/image.png
> >>>
> >>> I am not sure what changed to cause this. Cluster usage has been very
> >>> light (typically <10 iops) during this period, and the number of objects
> >>> stayed about the same.
> >>>
> >>> The only unusual occurrence was the reboot of one of the nodes the day
> >>> before (a firmware update). For the reboot, I ran "ceph osd set noout",
> >>> but forgot to unset it until several days later. Unsetting noout did not
> >>> stop the increase in memory consumption.
> >>>
> >>> I don't see anything unusual in the logs.
> >>>
> >>> Our nodes have SSDs and HDDs. Resident set size of SSD ODSs is about
> >>> 3.7GB. Resident set size of HDD OSDs varies from about 5GB to 12GB. I
> >>> don't know why there is such a big spread. All HDDs are 10TB, 72-76%
> >>> utilized, with 101-104 PGs.
> >>>
> >>> Does anybody know what might be the problem here and how to address or
> >>> debug it?
> >>>
> >>>
> >>> Thanks very much,
> >>>
> >>> Vlad
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@xxxxxxxxxxxxxx
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com