Re: [RFC PATCH] ceph: switch libceph on ratelimited messages

Ilya Dryomov <idryomov@xxxxxxxxx> · Thu, 16 Jan 2025 18:34:15 +0100

On Thu, Jan 16, 2025 at 2:38 AM Viacheslav Dubeyko
<Slava.Dubeyko@xxxxxxx> wrote:
>
> Hi Ilya,
>
> On Thu, 2025-01-16 at 00:21 +0100, Ilya Dryomov wrote:
> > On Wed, Jan 15, 2025 at 9:53 PM Viacheslav Dubeyko
> > <Slava.Dubeyko@xxxxxxx> wrote:
> > >
> > > Hello,
> > >
> > > The libceph subsystem can generate enourmous amount of
> > > messages in the case of error. As a result, system log
> > > can be unreasonably big because of such messaging
> > > policy. This patch switches on ratelimited version of
> >
> > Hi Slava,
> >
> > Do you have an example (which is not caused by a programming error)?
> >
>
> Frankly speaking, there is no stable ground for definition
> what is the programming error. :) And if end-user can see
> some messages in the system log, then it's not always clear
> what is the reason of it (faulty hardware, wrong configuration,
> network issue, or programming error).
>
> Currently, I can see during running xfstests some sporadically
> triggered issues (and I am going to investigate this). For example,
> today I can reproduce it for generic/127 (but it passed successfully
> multiple times before). The output of this issue is the infinite
> sequence of messages in the system log:
>
> Jan 15 16:39:06 ceph-testing-0001 kernel: [ 4345.164299] libceph: mon2
> (2)127.0.0.1:40902 socket error on write
> Jan 15 16:39:06 ceph-testing-0001 kernel: [ 4345.164321] libceph: mon1
> (2)127.0.0.1:40900 socket error on write
> Jan 15 16:39:06 ceph-testing-0001 kernel: [ 4345.668314] libceph: mon1
> (2)127.0.0.1:40900 socket error on write
> Jan 15 16:39:06 ceph-testing-0001 kernel: [ 4345.668337] libceph: mon2
> (2)127.0.0.1:40902 socket error on write
> Jan 15 16:39:07 ceph-testing-0001 kernel: [ 4346.660371] libceph: mon2
> (2)127.0.0.1:40902 socket error on write
>
> <skipped>
>
> Jan 15 17:16:30 ceph-testing-0001 kernel: [ 6589.691303] libceph: mon2
> (2)127.0.0.1:40902 socket error on write
> Jan 15 17:16:31 ceph-testing-0001 kernel: [ 6590.907396] libceph: osd1
> (2)127.0.0.1:6810 socket error on write
> Jan 15 17:16:34 ceph-testing-0001 kernel: [ 6593.659370] libceph: mon2
> (2)127.0.0.1:40902 socket error on write
> Jan 15 17:16:37 ceph-testing-0001 kernel: [ 6597.051461] libceph: mon2
> (2)127.0.0.1:40902 socket error on write
>
> <continue to spam system log until the system restart>

If there is an infinite loop running in the background, one has
a problem no matter whether the messages are ratelimited or not ;)
A blanket change to impose a limit on all libceph messages isn't
going help.

>
> > > pr_notice(), pr_info(), pr_warn(), and pr_err()
> > > methods by means of introducing libceph_notice(),
> > > libceph_info(), libceph_warn(), and libceph_err()
> > > methods.
> >
> > Some of libceph messages are already ratelimited and standard
> > pr_*_ratelimited macros are used for that.  They are few apart, so
> > if there is a particular message that is too spammy, switching it to
> > a ratelimited version shouldn't be a problem, but we won't take
> > a blanket conversion like this.
> >
>
> Yes, I agree that even ratelimited version of messaging cannot
> solve the problem of spamming the system log by info, warning, or
> error messages. As far as I can see, we have infinite cycle in
> libceph core library that generates this never ending sequence of
> messages. I believe that it's not user-friendly behavior and
> we need to rework it somehow. I still don't quite follow why libceph
> core library's logic is trying to repeat the same action and
> reports the error if we already failed. Could we rework it somehow?

I tried to elaborate on the premise in another thread:

    The messenger assumes that most errors are transient, so it simply
    reestablishes the session and resends outstanding requests.  The main
    reason for this is that depending on how far in the message the error
    is raised, a corresponding request may not be known yet (consider
    a scenario where the error pops up before the messenger gets to the
    fields that identify the request, for example) or there may not be
    a external request to fail at all.

It's also how the userspace messenger works, for the same reason.

> I believe that we have some wrong logic in current implementation.

AFAIR it's supposed to be looping with an exponential backoff, so it
shouldn't busy loop.  If it gets into an actual busy loop on a vanilla
kernel, that is something to look into.  The backoff factor and upper
limit could probably be tweaked too, if needed.

Thanks,

                Ilya