Re: librbd leaks memory on crushmap updates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 22, 2022 at 11:14 AM Peter Lieven <pl@xxxxxxx> wrote:
>
>
>
> Von meinem iPhone gesendet
>
> > Am 22.06.2022 um 10:35 schrieb Ilya Dryomov <idryomov@xxxxxxxxx>:
> >
> > On Tue, Jun 21, 2022 at 8:52 PM Peter Lieven <pl@xxxxxxx> wrote:
> >>
> >> Hi,
> >>
> >>
> >> we noticed that some of our long running VMs (1 year without migration) seem to have a very slow memory leak. Taking a dump of the leaked memory revealed that it seemed to contain osd and pool information so we concluded that it must have something to do with crush map updates. We then wrote a test script in our dev environment that constantly takes out osds and kicks then back in as soon as all remappings are done.
> >
> > Hi Peter,
> >
> > How did you determine what memory is being leaked?
>
> I found relatively large allocations in the qemu smaps and checked the contents. It contained several hundred repetitions of osd and pool names. We use the default builds on Ubuntu 20.04. Is there a special memory allocator in place that might not clean up properly?

Not really a special allocator but there is something referred to as
mempools -- an abstraction created to help with fine-grained memory use
tracking.  It is mostly used on the OSD side (various bluestore caches,
etc), but also for osdmaps on the client side.

>
> >
> >>
> >> With that script running the PSS usage of the Qemu process is constantly increasing (main memory of the VM is in hugetblfs) in an order of about 5MB / day for a very small dev cluster with approx. 40 OSDs and 5 pools.
> >>
> >> We have observed this issue first with Nautilus 14.2.22 and then also tried Octopus 15.2.16 where some issues #38403 should have been fixed.
> >
> > With the release of 15.2.17 in a few weeks, Octopus would be going
> > EOL.  Given that this is a dev cluster, can you try something more
> > recent -- preferably Quincy?
>
> Yes, I can as this is only a client issue. But for production it’s no option to move to Quincy.

If the issue exists in Quincy, it will get a lot more attention ;)
We will certainly consider a backport for the upcoming final Octopus
release if the issue is identified and fixed in time.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux