Re: Random scrub errors (omap_digest_mismatch) on pgs of RADOSGW metadata pools (bug 53663)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Stefan,

thanks for getting back to me!


On 10/02/2022 10:05, Stefan Schueffler wrote:
since my last mail in Dezember, we changed our ceph-setuo like this:

we added one SSD osd on each ceph host (which were pure HDD before). Then, we moved the problematic pool "de-dus5.rgw.buckets.index“ to those dedicated SSDs (by adding a corresponding crush map).

Since then, no further PG corruptions occurred.

This now has a two sided result:

on the one side, we now do not observe the problematic behavior anymore,

on the other side, this means, by using just spinning HDDs something is buggy with ceph. If the HDD can not fulfill the data IO requirements, then it probably should not lead to data/PG corruption…
And, just a blind guess, we only have a few IO requests in our RGW gateway per second - even with spinning HDDs there should not be a problem to store / update the index pool.

I would guess that it correlates with our setup having 7001 shards in the problematic bucket, and the implementation of „multisite“ feature, which will do 7001 „status“ requests per second to check and synchronize between the different rgw sites. And _this_ amount of random IO can not be satisfied by utilizing HDDs…
Anyway it should not lead to corrupted PGs.


We also have a multi-site setup and and and have one HDD-only and one cluster (primary) with NVME SSD for the OSD journaling. There are more inconsistencies on the HDD-only cluster, but we do observe those on the other cluster as well.

If you follow the issue at https://tracker.ceph.com/issues/53663 there is even another user (Dieter Roels) observing this issue now. He is talking about RADOSGW crashes potentially causing the inconsistencies. We already guessed it could be rolling restarts. But we cannot put our finger on it yet.

And yes, no amount of IO contention should ever cause data corruption.
In this case I believe there might be a correlation to the multisite feature hitting OMAP and stored metadata much harder than with regular RADOSGW usage. And if there is a race condition or missing lock /semaphore or something along this line, this certainly is affected by the latency on the underlying storage.



Could you maybe trigger manual a deep-scrub on all your OSDs, just to see if that does anything?




Thanks again for keeping in touch!
Regards


Christian






_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux