Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

we also face the same or at least a very similar  problem. We are running pacific (16.2.6 and 16.2.7, upgraded from 16.2.x to y to z) on both sides of the rgw multisite. In our case, the scrub errors occur on the secondary side only.
Even without adding a lot of rgw objects (only a few PUTs per minute), we have thousands and thousands of rgw bucket.sync log entries in the rgw log pool (this seems to be a separate problem), and as such we accumulate „large omap objects“ over time.
Maybe the large omap objects are part of the scrub error problem?

Regards,
Stefan

> Am 21.12.2021 um 09:03 schrieb Christian Rohmann <christian.rohmann@xxxxxxxxx>:
> 
> Hello Eugen,
> 
> On 20/12/2021 22:02, Eugen Block wrote:
>> you wrote that this cluster was initially installed with Octopus, so no upgrade ceph wise? Are all RGW daemons on the exact same ceph (minor) versions?
>> I remember one of our customers reporting inconsistent objects on a regular basis although no hardware issues were detectable. They replicate between two sites, too. A couple of months ago both sites were updated to the same exact ceph minor version (also Octopus), they haven't faced inconsistencies since then. I don't have details about the ceph version(s) though, only that both sites were initially installed with Octopus. Maybe it's worth checking your versions? 
> 
> 
> Yes, everything has the same version:
> 
>> {
>> [...]
>>    "overall": {
>>        "ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)": 34
>>    }
>> }
>> 
> I just observed another 3 scrub errors. Strangely they never see to have occurred on the same pgs again.
> I shall be running another deep scrub on those OSD again to narrow this down.
> 
> 
> 
> But I am somewhat suspecting this to be a potential issue with the OMAP validation part of the scrubbing.
> For RADOSGW there are large OMAP structures with lots of movement. And the issues only are with the metadata pools.
> 
> 
> 
> 
> Regards
> 
> 
> Christian
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux