Hi, 2022年4月29日(金) 6:54 Ilya Dryomov <idryomov@xxxxxxxxx>: > > On Thu, Apr 28, 2022 at 9:27 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > > > On Thu, Apr 28, 2022 at 1:06 AM Yuri Weinstein <yweinste@xxxxxxxxxx> wrote: > > > > > > We are seeing issues during tests, visible in upgrade tests > > > (http://pulpito.front.sepia.ceph.com/yuriw-2022-04-27_14:24:25-upgrade:octopus-x-pacific-distro-default-smithi/) > > > > > > Looks like trackers: > > > https://tracker.ceph.com/issues/55444 > > > https://tracker.ceph.com/issues/55475 > > > > > > Casey, Ilya pls advise if they have to be addressed before 16.2.8 release > > > > The RBD failures are rather puzzling but seem to be reproducible (at > > least in teuthology, not locally so far). I doubt it is something new > > so probably not a blocker for -rc -- will investigate in parallel. > > Neha, this looks like a critical omap handling regression on the OSD > side to me. Definitely a blocker for 16.2.8. I have reassigned the > above RBD ticket to you. I encountered data corruption in my cluster. When I upgraded from v16.2.7 + some patches (PR#43581, 44413, 45502, 45654) to this version plus PR#45963 patch, an unfound object appeared. After trying to fix this problem, the unfound object disappeared but there is at least one inconsistent PG. Here is what I did after the above-mentioned upgrading. 1. Stopped an OSD that is related to an unfound object. Then unfound object disappeared, but an inconsistent PG appeared. 2. To resolve the inconsistency, downgraded the Ceph version to the previous version (which does not contain PR#45963). But some OSDs started to crash. 3. Going back to the new version (which contains PR#45963) and then additional inconsistent PGs appeared. 4. Some of them were fixed with the following document, but at least one PG is still inconsistent and there might be other problematic pgs. https://docs.ceph.com/en/latest/rados/operations/pg-repair/ Does anyone know is it possible to resolve this corruption and how to do it? I'll upgrade the Ceph version to v16.2.7 + PR#46096. But it's unsure whether this upgrade resolves my issue. Additional information. - Some OSDs were created in v16.2.z and the others were in v15.2.z or older. - `rados list-inconsistent-obj` reports there is no inconsistent object as follows despite this pg is inconsistent. rados list-inconsistent-obj 2.57 | jq . { "epoch": 90596, "inconsistents": [] } Thanks, Satoru > > Yuri, the natural suspect is https://github.com/ceph/ceph/pull/45963. > I would suggest building a pacific branch without it and rerunning my > test run: > > https://pulpito.ceph.com/dis-2022-04-28_17:15:07-upgrade:octopus-x-pacific-distro-default-smithi/ > > Thanks, > > Ilya > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx