Dear Frédéric, 1/ Identify the shards with the most sync errors log entries: I have identified the shard which is causing the issue is shard 31, but almost all the error shows only one object of a bucket. And the object exists in the master zone. but I'm not sure why the replication site is unable to sync it. 2/ For each shard, list every sync error log entry along with their ids: radosgw-admin sync error list --shard-id=X The output of this command shows same shard and same objects mostly (shard 31 and object /plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png) 3/ Remove them **except the last one** with: radosgw-admin sync error trim --shard-id=X --marker=1_1682101321.201434_8669.1 Trimming did remove a few entries from the error log. But still there are many error logs for the same object which I am unable to trim. Now the trim command is executing successfully but not doing anything. I am still getting error about the object which is not syncing in radosgw log: 2025-03-15T03:05:48.060+0530 7fee2affd700 0 RGW-SYNC:data:sync:shard[80]:entry[mbackup:70134e66-872072ee2d32.2205852207.1:48]:bucket_sync_sources[target=:[]):source_bucket=:[]):source_zone=872072ee2d32]:bucket[mbackup:70134e66-872072ee2d32.2205852207.1:48<-mod-backup:70134e66-872072ee2d32.2205852207.1:48]:full_sync[mod-backup:70134e66-872072ee2d32.2205852207.1:48]:entry[wp-content/plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png]: ERROR: failed to sync object: mbackup:70134e66-872072ee2d32.2205852207.1:48/wp-content/plugins/plugins/yellow-pencil-visual-theme-customizer/images/cursor.png I am getting this error from appox two months, And if I remember correctly, we are getting LARGE OMAP warning from then only. I will try to delete this object from the Master zone on Monday and will see if this fixes the issue. Do you have any other suggestions on this, which I should consider? Regards, Danish On Thu, Mar 13, 2025 at 6:07 PM Frédéric Nass < frederic.nass@xxxxxxxxxxxxxxxx> wrote: > Hi Danish, > > Can you access this KB article [1]? A free developer account should allow > you to. > > It pretty much describes what you're facing and suggests to trim the sync > error log of recovering shards. Actually, every log entry **except the last > one**. > > 1/ Identify the shards with the most sync errors log entries: > > radosgw-admin sync error list --max-entries=1000000 | grep shard_id | sort > -n | uniq -c | sort -h > > 2/ For each shard, list every sync error log entry along with their ids: > > radosgw-admin sync error list --shard-id=X > > 3/ Remove them **except the last one** with: > > radosgw-admin sync error trim --shard-id=X > --marker=1_1682101321.201434_8669.1 > > the --marker above being the log entry id. > > Are the replication threads running on the same RGWs that S3 clients are > using? > > If so, using dedicated RGWs for the sync job might help you avoid > non-recovering shards in the future, as described in Matthew's post [2] > > Regards, > Frédéric. > > [1] https://access.redhat.com/solutions/7023912 > [2] https://www.spinics.net/lists/ceph-users/msg83988.html > > ----- Le 12 Mar 25, à 11:15, Danish Khan danish52.jmi@xxxxxxxxx a écrit : > > > Dear All, > > > > My ceph cluster is giving Large OMAP warning from approx 2-3 Months. I > > tried a few things like : > > *Deep scrub of PGs* > > *Compact OSDs* > > *Trim log* > > But these didn't work out. > > > > I guess the main issue is that 4 shards in replication site are always > > recovering from 2-3 months. > > > > Any suggestions are highly appreciated. > > > > Sync status: > > root@drhost1:~# radosgw-admin sync status > > realm e259e0a92 (object-storage) > > zonegroup 7a8606d2 (staas) > > zone c8022ad1 (repstaas) > > metadata sync syncing > > full sync: 0/64 shards > > incremental sync: 64/64 shards > > metadata is caught up with master > > data sync source: 2072ee2d32 (masterstaas) > > syncing > > full sync: 0/128 shards > > incremental sync: 128/128 shards > > data is behind on 3 shards > > behind shards: [7,90,100] > > oldest incremental change not applied: > > 2025-03-12T13:14:10.268469+0530 [7] > > 4 shards are recovering > > recovering shards: [31,41,55,80] > > > > > > Master site: > > 1. *root@master1:~# for obj in $(rados ls -p masterstaas.rgw.log); do > echo > > "$(rados listomapkeys -p masterstaas.rgw.log $obj | wc -l) $obj";done | > > sort -nr | head -10* > > 1225387 data_log.91 > > 1225065 data_log.86 > > 1224662 data_log.87 > > 1224448 data_log.92 > > 1224018 data_log.89 > > 1222156 data_log.93 > > 1201489 data_log.83 > > 1174125 data_log.90 > > 363498 data_log.84 > > 258709 data_log.6 > > > > > > 2. *root@master1:~# for obj in data_log.91 data_log.86 data_log.87 > > data_log.92 data_log.89 data_log.93 data_log.83 data_log.90; do rados > stat > > -p masterstaas.rgw.log $obj; done* > > masterstaas.rgw.log/data_log.91 mtime 2025-02-24T15:09:25.000000+0530, > size > > 0 > > masterstaas.rgw.log/data_log.86 mtime 2025-02-24T15:01:25.000000+0530, > size > > 0 > > masterstaas.rgw.log/data_log.87 mtime 2025-02-24T15:02:25.000000+0530, > size > > 0 > > masterstaas.rgw.log/data_log.92 mtime 2025-02-24T15:11:01.000000+0530, > size > > 0 > > masterstaas.rgw.log/data_log.89 mtime 2025-02-24T14:54:55.000000+0530, > size > > 0 > > masterstaas.rgw.log/data_log.93 mtime 2025-02-24T14:53:25.000000+0530, > size > > 0 > > masterstaas.rgw.log/data_log.83 mtime 2025-02-24T14:16:21.000000+0530, > size > > 0 > > masterstaas.rgw.log/data_log.90 mtime 2025-02-24T15:05:25.000000+0530, > size > > 0 > > > > *3. ceph cluster log :* > > 2025-02-22T04:18:27.324886+0530 osd.173 (osd.173) 19 : cluster [WRN] > Large > > omap object found. Object: 124:b2ddf551:::data_log.93:head PG: > 124.8aafbb4d > > (124.d) Key count: 1218170 Size (bytes): 297085860 > > 2025-02-22T04:18:28.735886+0530 osd.65 (osd.65) 308 : cluster [WRN] Large > > omap object found. Object: 124:f2081d70:::data_log.92:head PG: > 124.eb8104f > > (124.f) Key count: 1220420 Size (bytes): 295240028 > > 2025-02-22T04:18:30.668884+0530 mon.master1 (mon.0) 7974038 : cluster > [WRN] > > Health check update: 3 large omap objects (LARGE_OMAP_OBJECTS) > > 2025-02-22T04:18:31.127585+0530 osd.18 (osd.18) 224 : cluster [WRN] Large > > omap object found. Object: 124:d1061236:::data_log.86:head PG: > 124.6c48608b > > (124.b) Key count: 1221047 Size (bytes): 295398274 > > 2025-02-22T04:18:33.189561+0530 osd.37 (osd.37) 32665 : cluster [WRN] > Large > > omap object found. Object: 124:9a2e04b7:::data_log.87:head PG: > 124.ed207459 > > (124.19) Key count: 1220584 Size (bytes): 295290366 > > 2025-02-22T04:18:35.007117+0530 osd.77 (osd.77) 135 : cluster [WRN] Large > > omap object found. Object: 124:6b9e929a:::data_log.89:head PG: > 124.594979d6 > > (124.16) Key count: 1219913 Size (bytes): 295127816 > > 2025-02-22T04:18:36.189141+0530 mon.master1 (mon.0) 7974039 : cluster > [WRN] > > Health check update: 5 large omap objects (LARGE_OMAP_OBJECTS) > > 2025-02-22T04:18:36.340247+0530 osd.112 (osd.112) 259 : cluster [WRN] > Large > > omap object found. Object: 124:0958bece:::data_log.83:head PG: > 124.737d1a90 > > (124.10) Key count: 1200406 Size (bytes): 290280292 > > 2025-02-22T04:18:38.523766+0530 osd.73 (osd.73) 1064 : cluster [WRN] > Large > > omap object found. Object: 124:fddd971f:::data_log.91:head PG: > 124.f8e9bbbf > > (124.3f) Key count: 1221183 Size (bytes): 295425320 > > 2025-02-22T04:18:42.619926+0530 osd.92 (osd.92) 285 : cluster [WRN] Large > > omap object found. Object: 124:7dc404fa:::data_log.90:head PG: > 124.5f2023be > > (124.3e) Key count: 1169895 Size (bytes): 283025576 > > 2025-02-22T04:18:44.242655+0530 mon.master1 (mon.0) 7974043 : cluster > [WRN] > > Health check update: 8 large omap objects (LARGE_OMAP_OBJECTS) > > > > Replica site: > > 1. *for obj in $(rados ls -p repstaas.rgw.log); do echo "$(rados > > listomapkeys -p repstaas.rgw.log $obj | wc -l) $obj";done | sort -nr | > head > > -10* > > > > 432850 data_log.91 > > 432384 data_log.87 > > 432323 data_log.93 > > 431783 data_log.86 > > 431510 data_log.92 > > 427959 data_log.89 > > 414522 data_log.90 > > 407571 data_log.83 > > 151015 data_log.84 > > 109790 data_log.4 > > > > > > 2. *ceph cluster log:* > > grep -ir "Large omap object found" /var/log/ceph/ > > /var/log/ceph/ceph-mon.drhost1.log:2025-03-12T14:49:59.997+0530 > > 7fc4ad544700 0 log_channel(cluster) log [WRN] : Search the cluster > log > > for 'Large omap object found' for more details. > > /var/log/ceph/ceph.log:2025-03-12T14:49:02.078108+0530 osd.10 (osd.10) > 21 : > > cluster [WRN] Large omap object found. Object: > > 6:b2ddf551:::data_log.93:head PG: 6.8aafbb4d (6.d) Key count: 432323 Size > > (bytes): 105505884 > > /var/log/ceph/ceph.log:2025-03-12T14:49:02.389288+0530 osd.48 (osd.48) > 37 : > > cluster [WRN] Large omap object found. Object: > > 6:d1061236:::data_log.86:head PG: 6.6c48608b (6.b) Key count: 431782 Size > > (bytes): 104564674 > > /var/log/ceph/ceph.log:2025-03-12T14:49:07.166954+0530 osd.24 (osd.24) > 24 : > > cluster [WRN] Large omap object found. Object: > > 6:0958bece:::data_log.83:head PG: 6.737d1a90 (6.10) Key count: 407571 > Size > > (bytes): 98635522 > > /var/log/ceph/ceph.log:2025-03-12T14:49:09.100110+0530 osd.63 (osd.63) 5 > : > > cluster [WRN] Large omap object found. Object: > > 6:9a2e04b7:::data_log.87:head PG: 6.ed207459 (6.19) Key count: 432384 > Size > > (bytes): 104712350 > > /var/log/ceph/ceph.log:2025-03-12T14:49:08.703760+0530 osd.59 (osd.59) > 11 : > > cluster [WRN] Large omap object found. Object: > > 6:6b9e929a:::data_log.89:head PG: 6.594979d6 (6.16) Key count: 427959 > Size > > (bytes): 103773777 > > /var/log/ceph/ceph.log:2025-03-12T14:49:11.126132+0530 osd.40 (osd.40) > 24 : > > cluster [WRN] Large omap object found. Object: > > 6:f2081d70:::data_log.92:head PG: 6.eb8104f (6.f) Key count: 431508 Size > > (bytes): 104520406 > > /var/log/ceph/ceph.log:2025-03-12T14:49:13.799473+0530 osd.43 (osd.43) > 61 : > > cluster [WRN] Large omap object found. Object: > > 6:fddd971f:::data_log.91:head PG: 6.f8e9bbbf (6.1f) Key count: 432850 > Size > > (bytes): 104418869 > > /var/log/ceph/ceph.log:2025-03-12T14:49:14.398480+0530 osd.3 (osd.3) 55 : > > cluster [WRN] Large omap object found. Object: > > 6:7dc404fa:::data_log.90:head PG: 6.5f2023be (6.1e) Key count: 414521 > Size > > (bytes): 100396561 > > /var/log/ceph/ceph.log:2025-03-12T14:50:00.000484+0530 mon.drhost1 > (mon.0) > > 207423 : cluster [WRN] Search the cluster log for 'Large omap object > > found' for more details. > > > > Regards, > > Danish > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx