Re: Questions about the QA process and the data format of both OSD and MON

Satoru Takeuchi <satoru.takeuchi@xxxxxxxxx> · Mon, 5 Sep 2022 22:33:24 +0900

Hi Neha,

2022年9月1日(木) 21:16 Neha Ojha <nojha@xxxxxxxxxx>:

> Hi Satoru,
>
> Apologies for the delay in responding to your questions.
>
> In the case of https://github.com/ceph/ceph/pull/45963, we caught the
> bug in an upgrade test (as described in
> https://tracker.ceph.com/issues/55444) and not in the rados test
> suite. Our upgrade test suites are meant to be run before we do point
> releases, to ensure backward compatibility with N-2 and N-1 releases.
> As I mentioned in https://tracker.ceph.com/issues/55444#note-12, the
> same test scenario was run as a part of the rados suite before the
> original PR was merged, but we did not catch this bug then, because
> the we were testing the patch against pacific, where resharding is on
> by default. When the same test scenario was run as a part of the
> upgrade test, resharding was not on by default, so we caught this
> issue. We are have minimal upgrade tests within the rados suite, but
> unfortunately, they weren't enough to catch this particular issue.
>

Thank you for the detailed explanation!

> Rest of the answers are inline.
>
> On Fri, Aug 26, 2022 at 2:52 AM Satoru Takeuchi
> <satoru.takeuchi@xxxxxxxxx> wrote:
> >
> > Could anyone answe this question? There are many questions but it's of
> > course really helpful to know the answer of just one question.
> >
> > The summary of my questions.
> >
> > - a. About QA process
> >   - a.1 The nunber of test cases differ between the QA for merging a PR
> and
> > the QA for release?
> >    - a.2 If a.1 is correct, is it possible to chenge the CI system to run
> > all test cases in both QAs?
>
> Our testing framework, teuthology uses a concept of subsets to
> determine the number of jobs scheduled in a run, you can find detailed
> information about it in
>
> https://docs.ceph.com/en/latest/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-intro/#nested-subsets
> .
> I have already explained what was different in the case of the bug you
> referred to. We can certainly encourage more use of our upgrade suites
> before merging critical PRs.
>

I agree with this idea. It can find critical bug as early as possible.

> >    - a.3 How to run Teuthology in my local environment?
>
> At this point, we have the ability to run some tests locally using
> teuthology, Junior (cc'ed here) did a presentation on this topic,
> which was recorded here: https://www.youtube.com/watch?v=wZHcg0oVzhY.
>

Thank you very much. I will definitely watch this video and will try again
to run Teuthology.

> > - b. Is there any way to know the data format of each OSD and MON (e.g.
> > created in version A with cobfiguration B) by issuing Ceph commands?
>
> I am not quite sure I understand your question.
>

OK, I should elaborate this question.

Users sometimes hit problems only in OSDs created in older Ceph versions.
In my case, I hit a bug of PR#45963 only in pre-pacific OSDs.
IIRC, it's due to the change of the naming convention of rocksdb key to
store omap data.

In addition, Recently I found another example.

https://tracker.ceph.com/issues/56488

One of the reasons why this kind of bugs happen is that older OSDs are not
well tested in new versions. However, running full tests to all types of
OSDs would be impractical.

To minimize the risk of this kind of bugs, I'd like to keep my OSDs as
newer as possible.
In other words, replace old OSDs with new ones when my cluster is upgraded
and the upgraded version changes the data format of OSDs (like the
above-mentioned rocksdb key name convention).

To achieve this goal, I'd like to know if there is a way to distinguish
which OSDs should I re-create.
The possible information is the OSD creation timestamp, the Ceph version in
which OSD was created, the version of OSD data format, and so on.
I also mentioned MON because I thought that MON might change its format
like OSD.

If there is no such way and it should be discussed in issue, I'll open a
new issue.

Best,
Satoru
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx