On 11/29/18 6:58 PM, Bryan Stillwell wrote: > Wido, > > I've been looking into this large omap objects problem on a couple of our clusters today and came across your script during my research. > > The script has been running for a few hours now and I'm already over 100,000 'orphaned' objects! > > It appears that ever since upgrading to Luminous (12.2.5 initially, followed by 12.2.8) this cluster has been resharding the large bucket indexes at least once a day and not cleaning up the previous bucket indexes: > > for instance in $(radosgw-admin metadata list bucket.instance | jq -r '.[]' | grep go-test-dashboard); do > mtime=$(radosgw-admin metadata get bucket.instance:${instance} | grep mtime) > num_shards=$(radosgw-admin metadata get bucket.instance:${instance} | grep num_shards) > echo "${instance}: ${mtime} ${num_shards}" > done | column -t | sort -k3 > go-test-dashboard:default.188839135.327804: "mtime": "2018-06-01 22:35:28.693095Z", "num_shards": 0, > go-test-dashboard:default.617828918.2898: "mtime": "2018-06-02 22:35:40.438738Z", "num_shards": 46, > go-test-dashboard:default.617828918.4: "mtime": "2018-06-02 22:38:21.537259Z", "num_shards": 46, > go-test-dashboard:default.617663016.10499: "mtime": "2018-06-03 23:00:04.185285Z", "num_shards": 46, > [...snip...] > go-test-dashboard:default.891941432.342061: "mtime": "2018-11-28 01:41:46.777968Z", "num_shards": 7, > go-test-dashboard:default.928133068.2899: "mtime": "2018-11-28 20:01:49.390237Z", "num_shards": 46, > go-test-dashboard:default.928133068.5115: "mtime": "2018-11-29 01:54:17.788355Z", "num_shards": 7, > go-test-dashboard:default.928133068.8054: "mtime": "2018-11-29 20:21:53.733824Z", "num_shards": 7, > go-test-dashboard:default.891941432.359004: "mtime": "2018-11-29 20:22:09.201965Z", "num_shards": 46, > > The num_shards is typically around 46, but looking at all 288 instances of that bucket index, it has varied between 3 and 62 shards. > > Have you figured anything more out about this since you posted this originally two weeks ago? > > Thanks, > Bryan > > From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Wido den Hollander <wido@xxxxxxxx> > Date: Thursday, November 15, 2018 at 5:43 AM > To: Ceph Users <ceph-users@xxxxxxxx> > Subject: Removing orphaned radosgw bucket indexes from pool > > Hi, > > Recently we've seen multiple messages on the mailinglists about people > seeing HEALTH_WARN due to large OMAP objects on their cluster. This is > due to the fact that starting with 12.2.6 OSDs warn about this. > > I've got multiple people asking me the same questions and I've done some > digging around. > > Somebody on the ML wrote this script: > > for bucket in `radosgw-admin metadata list bucket | jq -r '.[]' | sort`; do > actual_id=`radosgw-admin bucket stats --bucket=${bucket} | jq -r '.id'` > for instance in `radosgw-admin metadata list bucket.instance | jq -r > '.[]' | grep ${bucket}: | cut -d ':' -f 2` > do > if [ "$actual_id" != "$instance" ] > then > radosgw-admin bi purge --bucket=${bucket} --bucket-id=${instance} > radosgw-admin metadata rm bucket.instance:${bucket}:${instance} > fi > done > done > > That partially works, but 'orphaned' objects in the index pool do not work. > > So I wrote my own script [0]: > > #!/bin/bash > INDEX_POOL=$1 > > if [ -z "$INDEX_POOL" ]; then > echo "Usage: $0 <index pool>" > exit 1 > fi > > INDEXES=$(mktemp) > METADATA=$(mktemp) > > trap "rm -f ${INDEXES} ${METADATA}" EXIT > > radosgw-admin metadata list bucket.instance|jq -r '.[]' > ${METADATA} > rados -p ${INDEX_POOL} ls > $INDEXES > > for OBJECT in $(cat ${INDEXES}); do > MARKER=$(echo ${OBJECT}|cut -d '.' -f 3,4,5) > grep ${MARKER} ${METADATA} > /dev/null > if [ "$?" -ne 0 ]; then > echo $OBJECT > fi > done > > It does not remove anything, but for example, it returns these objects: > > .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752 > .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162 > .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186 > > The output of: > > $ radosgw-admin metadata list|jq -r '.[]' > > Does not contain: > - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752 > - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162 > - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186 > > So for me these objects do not seem to be tied to any bucket and seem to > be leftovers which were not cleaned up. > > For example, I see these objects tied to a bucket: > > - b32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6160 > - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6188 > - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6167 > > But notice the difference: 6160, 6188, 6167, but not 6162 nor 6186 > > Before I remove these objects I want to verify with other users if they > see the same and if my thinking is correct. > > Wido > > [0]: https://gist.github.com/wido/6650e66b09770ef02df89636891bef04 This is a known issue and there are multiple commits on the upstream luminous branch designed to address this in a variety of ways, such as resharding being more robust, resharding cleaning up old shards automatically, and administrative command-line support to manually clean up old shards. These will all be included in the next luminous release. Eric _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com