On Fri, Sep 28, 2018 at 7:26 AM Xiangyang Yu <penglaiyxy@xxxxxxxxx> wrote: > > Hi cephers, > > In ceph version 10.2.11, we use three replication to protect our > production data. > > Today I find some inconsistent pgs, I try to repair , but the pg still > shows inconsistent. > > I go deep into the problem and find that, all three replicated data > has the same data digest and omap digest both in object_info_t and > scrub map object , but data digest in object_info_t is different from > the data digest in scrub map object. And the log displays that ": > failed to pick suitable auth object" and go out without repare the > data digest in object_info_t . > > In hammer , when this situation appears, the code would repair the > data digest in object_info_t, > but in jewel and Lumious, nothing is done. I think in Hammer it just didn't notice that there was an issue in this case? Or at least it didn't check the data digest before "recovering" from the primary. > > So i want to check , is it need to repair the objects when encoutering > this situation ? Does the jewel and Lumious missed this situation? It sounds like what's actually happened is the object's metadata disagrees with the data, and so all Ceph knows is it *doesn't* have the correct data, so it's refusing to do a fake repair. It'd be nice if we could force it to accept what's there, but we don't want to do it automatically! A workaround for now might just be to copy the object(s) off the local storage and then put it(them) back in via RADOS... -Greg > > If we want to repair,, i will open a tracker and do a pull request. > > Cheers, > brandy