Re: Removing orphaned radosgw bucket indexes from pool

"J. Eric Ivancich" <ivancich@xxxxxxxxxx> · Tue, 18 Dec 2018 17:23:13 -0500

On 11/29/18 6:58 PM, Bryan Stillwell wrote:
> Wido,
> 
> I've been looking into this large omap objects problem on a couple of our clusters today and came across your script during my research.
> 
> The script has been running for a few hours now and I'm already over 100,000 'orphaned' objects!
> 
> It appears that ever since upgrading to Luminous (12.2.5 initially, followed by 12.2.8) this cluster has been resharding the large bucket indexes at least once a day and not cleaning up the previous bucket indexes:
> 
> for instance in $(radosgw-admin metadata list bucket.instance | jq -r '.[]' | grep go-test-dashboard); do
>   mtime=$(radosgw-admin metadata get bucket.instance:${instance} | grep mtime)
>   num_shards=$(radosgw-admin metadata get bucket.instance:${instance} | grep num_shards)
>   echo "${instance}: ${mtime} ${num_shards}"
> done | column -t | sort -k3
> go-test-dashboard:default.188839135.327804:  "mtime":  "2018-06-01  22:35:28.693095Z",  "num_shards":  0,
> go-test-dashboard:default.617828918.2898:    "mtime":  "2018-06-02  22:35:40.438738Z",  "num_shards":  46,
> go-test-dashboard:default.617828918.4:       "mtime":  "2018-06-02  22:38:21.537259Z",  "num_shards":  46,
> go-test-dashboard:default.617663016.10499:   "mtime":  "2018-06-03  23:00:04.185285Z",  "num_shards":  46,
> [...snip...]
> go-test-dashboard:default.891941432.342061:  "mtime":  "2018-11-28  01:41:46.777968Z",  "num_shards":  7,
> go-test-dashboard:default.928133068.2899:    "mtime":  "2018-11-28  20:01:49.390237Z",  "num_shards":  46,
> go-test-dashboard:default.928133068.5115:    "mtime":  "2018-11-29  01:54:17.788355Z",  "num_shards":  7,
> go-test-dashboard:default.928133068.8054:    "mtime":  "2018-11-29  20:21:53.733824Z",  "num_shards":  7,
> go-test-dashboard:default.891941432.359004:  "mtime":  "2018-11-29  20:22:09.201965Z",  "num_shards":  46,
> 
> The num_shards is typically around 46, but looking at all 288 instances of that bucket index, it has varied between 3 and 62 shards.
> 
> Have you figured anything more out about this since you posted this originally two weeks ago?
> 
> Thanks,
> Bryan
> 
> From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Wido den Hollander <wido@xxxxxxxx>
> Date: Thursday, November 15, 2018 at 5:43 AM
> To: Ceph Users <ceph-users@xxxxxxxx>
> Subject:  Removing orphaned radosgw bucket indexes from pool
> 
> Hi,
> 
> Recently we've seen multiple messages on the mailinglists about people
> seeing HEALTH_WARN due to large OMAP objects on their cluster. This is
> due to the fact that starting with 12.2.6 OSDs warn about this.
> 
> I've got multiple people asking me the same questions and I've done some
> digging around.
> 
> Somebody on the ML wrote this script:
> 
> for bucket in `radosgw-admin metadata list bucket | jq -r '.[]' | sort`; do
>   actual_id=`radosgw-admin bucket stats --bucket=${bucket} | jq -r '.id'`
>   for instance in `radosgw-admin metadata list bucket.instance | jq -r
> '.[]' | grep ${bucket}: | cut -d ':' -f 2`
>   do
>     if [ "$actual_id" != "$instance" ]
>     then
>       radosgw-admin bi purge --bucket=${bucket} --bucket-id=${instance}
>       radosgw-admin metadata rm bucket.instance:${bucket}:${instance}
>     fi
>   done
> done
> 
> That partially works, but 'orphaned' objects in the index pool do not work.
> 
> So I wrote my own script [0]:
> 
> #!/bin/bash
> INDEX_POOL=$1
> 
> if [ -z "$INDEX_POOL" ]; then
>     echo "Usage: $0 <index pool>"
>     exit 1
> fi
> 
> INDEXES=$(mktemp)
> METADATA=$(mktemp)
> 
> trap "rm -f ${INDEXES} ${METADATA}" EXIT
> 
> radosgw-admin metadata list bucket.instance|jq -r '.[]' > ${METADATA}
> rados -p ${INDEX_POOL} ls > $INDEXES
> 
> for OBJECT in $(cat ${INDEXES}); do
>     MARKER=$(echo ${OBJECT}|cut -d '.' -f 3,4,5)
>     grep ${MARKER} ${METADATA} > /dev/null
>     if [ "$?" -ne 0 ]; then
>         echo $OBJECT
>     fi
> done
> 
> It does not remove anything, but for example, it returns these objects:
> 
> .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752
> .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162
> .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186
> 
> The output of:
> 
> $ radosgw-admin metadata list|jq -r '.[]'
> 
> Does not contain:
> - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752
> - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162
> - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186
> 
> So for me these objects do not seem to be tied to any bucket and seem to
> be leftovers which were not cleaned up.
> 
> For example, I see these objects tied to a bucket:
> 
> - b32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6160
> - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6188
> - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6167
> 
> But notice the difference: 6160, 6188, 6167, but not 6162 nor 6186
> 
> Before I remove these objects I want to verify with other users if they
> see the same and if my thinking is correct.
> 
> Wido
> 
> [0]: https://gist.github.com/wido/6650e66b09770ef02df89636891bef04

This is a known issue and there are multiple commits on the upstream
luminous branch designed to address this in a variety of ways, such as
resharding being more robust, resharding cleaning up old shards
automatically, and administrative command-line support to manually clean
up old shards.

These will all be included in the next luminous release.

Eric
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com