Re: [PATCH] ceph: Introduce CONFIG_CEPH_LIB_DEBUG and CONFIG_CEPH_FS_DEBUG

Ilya Dryomov <idryomov@xxxxxxxxx> · Thu, 16 Jan 2025 18:04:55 +0100

On Thu, Jan 16, 2025 at 3:01 AM Viacheslav Dubeyko
<Slava.Dubeyko@xxxxxxx> wrote:
>
> Hi Ilya,
>
> On Thu, 2025-01-16 at 00:04 +0100, Ilya Dryomov wrote:
> > On Wed, Jan 15, 2025 at 1:41 AM Viacheslav Dubeyko
> > <Slava.Dubeyko@xxxxxxx> wrote:
> > >
> > >
>
> <skipped>
>
> > >
> > > -void ceph_msg_data_cursor_init(struct ceph_msg_data_cursor
> > > *cursor,
> > > -                              struct ceph_msg *msg, size_t length)
> > > +int ceph_msg_data_cursor_init(struct ceph_msg_data_cursor *cursor,
> > > +                             struct ceph_msg *msg, size_t length)
> > >  {
> > > +#ifdef CONFIG_CEPH_LIB_DEBUG
> > >         BUG_ON(!length);
> > >         BUG_ON(length > msg->data_length);
> > >         BUG_ON(!msg->num_data_items);
> > > +#else
> > > +       if (!length)
> > > +               return -EINVAL;
> > > +
> > > +       if (length > msg->data_length)
> > > +               return -EINVAL;
> > > +
> > > +       if (!msg->num_data_items)
> > > +               return -EINVAL;
> > > +#endif /* CONFIG_CEPH_LIB_DEBUG */
> >
> > Hi Slava,
> >
> > I don't think this is a good idea.  I'm all for returning errors
> > where
> > it makes sense and is possible and such cases don't actually need to
> > be
> > conditioned on a CONFIG option.  Here, this EINVAL error would be
> > raised very far away from the cause -- potentially seconds later and
> > in
> > a different thread or even a different kernel module.  It would still
> > (eventually) hang the client because the messenger wouldn't be able
> > to
> > make progress for that connection/session.
> >
>
> First of all, let's split the patch on two parts:
> (1) CONFIG options suggestion;
> (2) practical application of CONFIG option.
>
> I believe that such CONFIG option is useful for adding
> pre-condition and post-condition checks in methods that
> could be executed in debug compilation and it will be
> excluded from release compilation for production case.
>
> Potentially, the first application of this CONFIG option
> is not good enough. However, the kernel crash is good for
> the problem investigation (debug compilation, for example),
> but end-user would like to see working kernel but not crashed one.
> And returning error is a way to behave in a nice way,
> from my point of view.

We can definitely consider such a CONFIG option where there is a good
application for it.

>
> > With this patch in place, in the scenario that you have been chasing
> > where CephFS apparently asks to read X bytes but sets up a reply
> > message with a data buffer that is smaller than X bytes, the
> > messenger
> > would enter a busy loop, endlessly reporting the new error,
> > "faulting",
> > reestablishing the session, resending the outstanding read request
> > and
> > attempting to fit the reply into the same (short) reply message.  I'd
> > argue that an endless loop is worse than an easily identifiable
> > BUG_ON
> > in one of the kworker threads.
> >
> > There is no good way to process the new error, at least not with the
> > current structure of the messenger.  In theory, the read request
> > could
> > be failed, but that would require wider changes and a bunch of
> > special
> > case code that would be there just to recover from what could have
> > been
> > a BUG_ON for an obvious programming error.
> >
>
> Yes, I totally see your point. But I believe that as kernel crash as
> busy loop is wrong behavior. Ideally, we need to report the error and
> continue to work without kernel crash or busy loop. Would we rework
> the logic to be more user-friendly and to behave more nicely?

I'm not sure it would be worth the effort in this particular case.

> I don't quite follow why do we have busy loop even if we know that
> request is failed? Generally speaking, failed request should be
> discarded, from the common sense. :)

The messenger assumes that most errors are transient, so it simply
reestablishes the session and resends outstanding requests.  The main
reason for this is that depending on how far in the message the error
is raised, a corresponding request may not be known yet (consider
a scenario where the error pops up before the messenger gets to the
fields that identify the request, for example) or there may not be
a external request to fail at all.  If the request is identified and
the error is assumed to be permanent, the request can indeed be failed
to the submitter, but currently there is no support for that.  There is
something more crude where the OSD client can tell the messenger to
skip the message and move on -- see get_reply() and @skip parameter
in net/ceph/osd_client.c.  Normally it's used to skip over duplicate or
misdirected messages, but it can also be used to skip a message on
a would-be-permanent error that is associated with that particular
message.  With that, the submitter is never going to see a reply to
that request and would likely get stuck due to that at some point, but
once again these are almost always basic logic errors.

Thanks,

                Ilya