Hi,
Having spent some time on the below issue, here are the
steps I took to resolve the "Large omap objects" warning.
Hopefully this will help others who find themselves in this
situation.
I got the object ID and OSD ID implicated from the ceph
cluster logfile on the mon. I then proceeded to the
implicated host containing the OSD, and extracted the
implicated PG by running the following, and looking at which
PG had started and completed a deep-scrub around the warning
being logged:
grep -C 200 Large /var/log/ceph/ceph-osd.*.log | egrep
'(Large omap|deep-scrub)'
If the bucket had not been sharded sufficiently (IE the
cluster log showed a "Key Count" or "Size" over the
thresholds), I ran through the manual sharding procedure
(shown here:
https://tracker.ceph.com/issues/24457#note-5)
radosgw-admin bi purge --bucket ${bucketname} --bucket-id
${old_bucket_id}
I then issued a ceph pg deep-scrub against the PG that had
contained the Large omap object.
Once I had completed this procedure, my Large omap object
warnings went away and the cluster returned to HEALTH_OK.
However our radosgw bucket indexes pool now seems to be
using substantially more space than previously. Having looked
initially at this bug, and in particular the first comment:
I was able to extract a number of bucket indexes that had
apparently been resharded, and removed the legacy index using
the radosgw-admin bi purge --bucket ${bucket} ${marker}. I am
still able to perform a radosgw-admin metadata get
bucket.instance:${bucket}:${marker} successfully, however now
when I run rados -p .rgw.buckets.index ls | grep ${marker}
nothing is returned. Even after this, we were still seeing
extremely high disk usage of our OSDs containing the bucket
indexes (we have a dedicated pool for this). I then modified
the one liner referenced in the previous link as follows:
grep -E '"bucket"|"id"|"marker"' bucket-stats.out | awk -F
":" '{print $2}' | tr -d '",' | while read -r bucket; do read
-r id; read -r marker; [ "$id" == "$marker" ] && true
|| NEWID=`radosgw-admin --id rgw.ceph-rgw-1 metadata get
bucket.instance:${bucket}:${marker} | python -c 'import sys,
json; print
json.load(sys.stdin)["data"]["bucket_info"]["new_bucket_instance_id"]'`;
while [ ${NEWID} ]; do if [ "${NEWID}" != "${marker}" ]
&& [ ${NEWID} != ${bucket} ] ; then echo "$bucket
$NEWID"; fi; NEWID=`radosgw-admin --id rgw.ceph-rgw-1 metadata
get bucket.instance:${bucket}:${NEWID} | python -c 'import
sys, json; print
json.load(sys.stdin)["data"]["bucket_info"]["new_bucket_instance_id"]'`;
done; done > buckets_with_multiple_reindexes2.txt
This loops through the buckets that have a different
marker/bucket_id, and looks to see if a new_bucket_instance_id
is there, and if so will loop through until there is no longer
a "new_bucket_instance_id". After letting this complete, this
suggests that I have over 5000 indexes for 74 buckets, some of
these buckets have > 100 indexes apparently.
:~# awk '{print $1}' buckets_with_multiple_reindexes2.txt
| uniq | wc -l
74
~# wc -l buckets_with_multiple_reindexes2.txt
5813 buckets_with_multiple_reindexes2.txt
Should I be OK to loop through these indexes and remove any
with a reshard_status of 2, a new_bucket_instance_id that does
not match the bucket_instance_id returned by the command:
radosgw-admin bucket stats --bucket ${bucket}
I'd ideally like to get to a point where I can turn dynamic
sharding back on safely for this cluster.
Thanks for any assistance, let me know if there's any more
information I should provide
Chris
Hi,
Thanks for the response - I am still unsure as to
what will happen to the "marker" reference in the
bucket metadata, as this is the object that is being
detected as Large. Will the bucket generate a new
"marker" reference in the bucket metadata?
I've been reading this page to try and get a better
understanding of this
However I'm no clearer on this (and what the
"marker" is used for), or why there are multiple
separate "bucket_id" values (with different mtime
stamps) that all show as having the same number of
shards.
If I were to remove the old bucket would I just be
looking to execute
rados - p .rgw.buckets.index rm
.dir.default.5689810.107
Is the differing marker/bucket_id in the other
buckets that was found also an indicator? As I say,
there's a good number of these, here's some additional
examples, though these aren't necessarily reporting as
large omap objects:
"BUCKET1", "default.281853840.479",
"default.105206134.5",
"BUCKET2", "default.364663174.1",
"default.349712129.3674",
Checking these other buckets, they are exhibiting
the same sort of symptoms as the first (multiple
instances of radosgw-admin metadata get showing what
seem to be multiple resharding processes being run,
with different mtimes recorded).
Thanks
On Thu, 4 Oct 2018 at 16:21 Konstantin
Shalygin <
k0ste@xxxxxxxx>
wrote:
Hi,
Ceph version: Luminous 12.2.7
Following upgrading to Luminous from Jewel we have been stuck with a
cluster in HEALTH_WARN state that is complaining about large omap objects.
These all seem to be located in our .rgw.buckets.index pool. We've
disabled auto resharding on bucket indexes due to seeming looping issues
after our upgrade. We've reduced the number reported of reported large
omap objects by initially increasing the following value:
~# ceph daemon mon.ceph-mon-1 config get
osd_deep_scrub_large_omap_object_value_sum_threshold
{
"osd_deep_scrub_large_omap_object_value_sum_threshold": "2147483648"
}
However we're still getting a warning about a single large OMAP object,
however I don't believe this is related to an unsharded index - here's the
log entry:
2018-10-01 13:46:24.427213 osd.477 osd.477 172.26.216.6:6804/2311858 8482 :
cluster [WRN] Large omap object found. Object:
15:333d5ad7:::.dir.default.5689810.107:head Key count: 17467251 Size
(bytes): 4458647149
The object in the logs is the "marker" object, rather than the bucket_id -
I've put some details regarding the bucket here:
https://pastebin.com/hW53kTxL
The bucket limit check shows that the index is sharded, so I think this
might be related to versioning, although I was unable to get confirmation
that the bucket in question has versioning enabled through the aws
cli(snipped debug output below)
2018-10-02 15:11:17,530 - MainThread - botocore.parsers - DEBUG - Response
headers: {'date': 'Tue, 02 Oct 2018 14:11:17 GMT', 'content-length': '137',
'x-amz-request-id': 'tx0000000000000020e3b15-005bb37c85-15870fe0-default',
'content-type': 'application/xml'}
2018-10-02 15:11:17,530 - MainThread - botocore.parsers - DEBUG - Response
body:
<?xml version="1.0" encoding="UTF-8"?><VersioningConfiguration xmlns="
http://s3.amazonaws.com/doc/2006-03-01/"></VersioningConfiguration>
After dumping the contents of large omap object mentioned above into a file
it does seem to be a simple listing of the bucket contents, potentially an
old index:
~# wc -l omap_keys
17467251 omap_keys
This is approximately 5 million below the currently reported number of
objects in the bucket.
When running the commands listed here:
http://tracker.ceph.com/issues/34307#note-1
The problematic bucket is listed in the output (along with 72 other
buckets):
"CLIENTBUCKET", "default.294495648.690", "default.5689810.107"
As this tests for bucket_id and marker fields not matching to print out the
information, is the implication here that both of these should match in
order to fully migrate to the new sharded index?
I was able to do a "metadata get" using what appears to be the old index
object ID, which seems to support this (there's a "new_bucket_instance_id"
field, containing a newer "bucket_id" and reshard_status is 2, which seems
to suggest it has completed).
I am able to take the "new_bucket_instance_id" and get additional metadata
about the bucket, each time I do this I get a slightly newer
"new_bucket_instance_id", until it stops suggesting updated indexes.
It's probably worth pointing out that when going through this process the
final "bucket_id" doesn't match the one that I currently get when running
'radosgw-admin bucket stats --bucket "CLIENTBUCKET"', even though it also
suggests that no further resharding has been done as "reshard_status" = 0
and "new_bucket_instance_id" is blank. The output is available to view
here:
https://pastebin.com/g1TJfKLU
It would be useful if anyone can offer some clarification on how to proceed
from this situation, identifying and removing any old/stale indexes from
the index pool (if that is the case), as I've not been able to spot
anything in the archives.
If there's any further information that is needed for additional context
please let me know.
Usually, when you bucket
is automatically resharded in some case old
big index is not deleted - this is your large
omap object.
This index is safe to
delete. Also look at [1].
[1] https://tracker.ceph.com/issues/24457