On Mon, Oct 7, 2019 at 7:20 AM Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx> wrote: > > > Do you have statistics on the size of the OSDMaps or count of them > > which were being maintained by the OSDs? > No, I don't think so. How can I find this information? Hmm I don't know if we directly expose the size of maps. There are perfcounters which expose the range of maps being kept around but I don't know their names off-hand. Maybe it's something else involving the bluestore cache or whatever; if you're not using the newer memory limits I'd switch to those but otherwise I dunno. -Greg > > Memory consumption started to climb again: > https://icecube.wisc.edu/~vbrik/graph-3.png > > Some more info (not sure if relevant or not): > > I increased size of the swap on the servers to 10GB and it's being > completely utilized, even though there is still quite a bit of free memory. > > It appears that memory is highly fragmented on the NUMA node 0 of all > the servers. Some of the servers have no free pages higher than order 0. > (Memory on NUMA node 1 of the servers appears much less fragmented.) > > The servers have 192GB of RAM, 2 NUMA nodes. > > > Vlad > > > > On 10/4/19 6:09 PM, Gregory Farnum wrote: > > Do you have statistics on the size of the OSDMaps or count of them > > which were being maintained by the OSDs? I'm not sure why having noout > > set would change that if all the nodes were alive, but that's my bet. > > -Greg > > > > On Thu, Oct 3, 2019 at 7:04 AM Vladimir Brik > > <vladimir.brik@xxxxxxxxxxxxxxxx> wrote: > >> > >> And, just as unexpectedly, things have returned to normal overnight > >> https://icecube.wisc.edu/~vbrik/graph-1.png > >> > >> The change seems to have coincided with the beginning of Rados Gateway > >> activity (before, it was essentially zero). I can see nothing in the > >> logs that would explain what happened though. > >> > >> Vlad > >> > >> > >> > >> On 10/2/19 3:43 PM, Vladimir Brik wrote: > >>> Hello > >>> > >>> I am running a Ceph 14.2.2 cluster and a few days ago, memory > >>> consumption of our OSDs started to unexpectedly grow on all 5 nodes, > >>> after being stable for about 6 months. > >>> > >>> Node memory consumption: https://icecube.wisc.edu/~vbrik/graph.png > >>> Average OSD resident size: https://icecube.wisc.edu/~vbrik/image.png > >>> > >>> I am not sure what changed to cause this. Cluster usage has been very > >>> light (typically <10 iops) during this period, and the number of objects > >>> stayed about the same. > >>> > >>> The only unusual occurrence was the reboot of one of the nodes the day > >>> before (a firmware update). For the reboot, I ran "ceph osd set noout", > >>> but forgot to unset it until several days later. Unsetting noout did not > >>> stop the increase in memory consumption. > >>> > >>> I don't see anything unusual in the logs. > >>> > >>> Our nodes have SSDs and HDDs. Resident set size of SSD ODSs is about > >>> 3.7GB. Resident set size of HDD OSDs varies from about 5GB to 12GB. I > >>> don't know why there is such a big spread. All HDDs are 10TB, 72-76% > >>> utilized, with 101-104 PGs. > >>> > >>> Does anybody know what might be the problem here and how to address or > >>> debug it? > >>> > >>> > >>> Thanks very much, > >>> > >>> Vlad > >>> _______________________________________________ > >>> ceph-users mailing list > >>> ceph-users@xxxxxxxxxxxxxx > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com