Hi Rok,
We're still try to catch what's causing the memory growth, so it's hard
to guess at which releases are affected. We know it's happening
intermittently on a live Pacific cluster at least. If you have the
ability to catch it while it's happening, there are several
approaches/tools that might aid in diagnosing it. Container deployments
are a bit tougher to get debugging tools working in though which afaik
has slowed down existing attempts at diagnosing the issue.
Mark
On 9/7/23 05:55, Rok Jaklič wrote:
Hi,
we have also experienced several ceph-mgr oom kills on ceph v16.2.13 on
120T/200T data.
Is there any tracker about the problem?
Does upgrade to 17.x "solves" the problem?
Kind regards,
Rok
On Wed, Sep 6, 2023 at 9:36 PM Ernesto Puerta <epuertat@xxxxxxxxxx> wrote:
Dear Cephers,
Today brought us an eventful CTL meeting: it looks like Jitsi recently
started
requiring user authentication
<https://jitsi.org/blog/authentication-on-meet-jit-si/> (anonymous users
will get a "Waiting for a moderator" modal), but authentication didn't work
against Google or GitHub accounts, so we had to move to the good old Google
Meet.
As a result of this, Neha has kindly set up a new private Slack channel
(#clt) to allow for quicker communication among CLT members (if you usually
attend the CLT meeting and have not been added, please ping any CLT member
to request that).
Now, let's move on the important stuff:
*The latest Pacific Release (v16.2.14)*
*The Bad*
The 14th drop of the Pacific release has landed with a few hiccups:
- Some .deb packages were made available to downloads.ceph.com before
the release process completion. Although this is not the first time it
happens, we want to ensure this is the last one, so we'd like to gather
ideas to improve the release publishing process. Neha encouraged
everyone
to share ideas here:
- https://tracker.ceph.com/issues/62671
- https://tracker.ceph.com/issues/62672
- v16.2.14 also hit issues during the ceph-container stage. Laura
wanted to raise awareness of its current setbacks
<https://pad.ceph.com/p/16.2.14-struggles> and collect ideas to tackle
them:
- Enforce reviews and mandatory CI checks
- Rework the current approach to use simple Dockerfiles
<https://github.com/ceph/ceph/pull/43292>
- Call the Ceph community for help: ceph-container is currently
maintained part-time by a single contributor (Guillaume Abrioux).
This
sub-project would benefit from the sound expertise on containers
among Ceph
users. If you have ever considered contributing to Ceph, but felt a
bit
intimidated by C++, Paxos and race conditions, ceph-container is a
good
place to shed your fear.
*The Good*
Not everything about v16.2.14 was going to be bleak: David Orman brought us
really good news. They tested v16.2.14 on a large production cluster
(10gbit/s+ RGW and ~13PiB raw) and found that it solved a major issue
affecting RGW in Pacific <https://github.com/ceph/ceph/pull/52552>.
*The Ugly*
During that testing, they noticed that ceph-mgr was occasionally OOM killed
(nothing new to 16.2.14, as it was previously reported). They already
tried:
- Disabling modules (like the restful one, which was a suspect)
- Enabling debug 20
- Turning the pg autoscaler off
Debugging will continue to characterize this issue:
- Enable profiling (Mark Nelson)
- Try Bloomberg's Python mem profiler
<https://github.com/bloomberg/memray> (Matthew Leonard)
*Infrastructure*
*Reminder: Infrastructure Meeting Tomorrow. **11:30-12:30 Central Time*
Patrick brought up the following topics:
- Need to reduce the OVH spending ($72k/year, which is a good cut in the
Ceph Foundation budget, that's a lot less avocado sandwiches for the
next
Cephalocon):
- Move services (e.g.: Chacra) to the Sepia lab
- Re-use CentOS (and any spared/unused) machines for devel purposes
- Current Ceph sys admins are overloaded, so devel/community involvement
would be much appreciated.
- More to be discussed in tomorrow's meeting. Please join if you
think you can help solve/improve the Ceph infrastrucru!
*BTW*: today's CDM will be canceled, since no topics were proposed.
Kind Regards,
Ernesto
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx