Hi, we have also experienced several ceph-mgr oom kills on ceph v16.2.13 on 120T/200T data. Is there any tracker about the problem? Does upgrade to 17.x "solves" the problem? Kind regards, Rok On Wed, Sep 6, 2023 at 9:36 PM Ernesto Puerta <epuertat@xxxxxxxxxx> wrote: > Dear Cephers, > > Today brought us an eventful CTL meeting: it looks like Jitsi recently > started > requiring user authentication > <https://jitsi.org/blog/authentication-on-meet-jit-si/> (anonymous users > will get a "Waiting for a moderator" modal), but authentication didn't work > against Google or GitHub accounts, so we had to move to the good old Google > Meet. > > As a result of this, Neha has kindly set up a new private Slack channel > (#clt) to allow for quicker communication among CLT members (if you usually > attend the CLT meeting and have not been added, please ping any CLT member > to request that). > > Now, let's move on the important stuff: > > *The latest Pacific Release (v16.2.14)* > > *The Bad* > The 14th drop of the Pacific release has landed with a few hiccups: > > - Some .deb packages were made available to downloads.ceph.com before > the release process completion. Although this is not the first time it > happens, we want to ensure this is the last one, so we'd like to gather > ideas to improve the release publishing process. Neha encouraged > everyone > to share ideas here: > - https://tracker.ceph.com/issues/62671 > - https://tracker.ceph.com/issues/62672 > - v16.2.14 also hit issues during the ceph-container stage. Laura > wanted to raise awareness of its current setbacks > <https://pad.ceph.com/p/16.2.14-struggles> and collect ideas to tackle > them: > - Enforce reviews and mandatory CI checks > - Rework the current approach to use simple Dockerfiles > <https://github.com/ceph/ceph/pull/43292> > - Call the Ceph community for help: ceph-container is currently > maintained part-time by a single contributor (Guillaume Abrioux). > This > sub-project would benefit from the sound expertise on containers > among Ceph > users. If you have ever considered contributing to Ceph, but felt a > bit > intimidated by C++, Paxos and race conditions, ceph-container is a > good > place to shed your fear. > > > *The Good* > Not everything about v16.2.14 was going to be bleak: David Orman brought us > really good news. They tested v16.2.14 on a large production cluster > (10gbit/s+ RGW and ~13PiB raw) and found that it solved a major issue > affecting RGW in Pacific <https://github.com/ceph/ceph/pull/52552>. > > *The Ugly* > During that testing, they noticed that ceph-mgr was occasionally OOM killed > (nothing new to 16.2.14, as it was previously reported). They already > tried: > > - Disabling modules (like the restful one, which was a suspect) > - Enabling debug 20 > - Turning the pg autoscaler off > > Debugging will continue to characterize this issue: > > - Enable profiling (Mark Nelson) > - Try Bloomberg's Python mem profiler > <https://github.com/bloomberg/memray> (Matthew Leonard) > > > *Infrastructure* > > *Reminder: Infrastructure Meeting Tomorrow. **11:30-12:30 Central Time* > > Patrick brought up the following topics: > > - Need to reduce the OVH spending ($72k/year, which is a good cut in the > Ceph Foundation budget, that's a lot less avocado sandwiches for the > next > Cephalocon): > - Move services (e.g.: Chacra) to the Sepia lab > - Re-use CentOS (and any spared/unused) machines for devel purposes > - Current Ceph sys admins are overloaded, so devel/community involvement > would be much appreciated. > - More to be discussed in tomorrow's meeting. Please join if you > think you can help solve/improve the Ceph infrastrucru! > > > *BTW*: today's CDM will be canceled, since no topics were proposed. > > Kind Regards, > > Ernesto > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx