Re: RGW how to delete orphans

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello

I was hoping to follow up on this email and if Florian manage to get to the bottom of this.

I have a case where I believe my RGW bucket is using too much space. For me, the ceph df command shows over 16TB usage, whereas the bucket stats shows the total of about 6TB. So, It seems that the 10TB is wasted somewhere and I would like to find out how to trim this.

I am running "ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)" on all cluster nodes (server and client side). I have the total of 4 osd servers with 48 osds, including a combination of SSD and SAS drives for different pools.

I have started "radosgw-admin orphans find --pool=.rgw.buckets --job-id=find1 --num-shards=64 --yes-i-really-mean-it" command about 2 weeks ago and the only output I can see from it is similar to this:

storing 20 entries at orphan.scan.find1.linked.50
storing 28 entries at orphan.scan.find1.linked.16

The command is still running and I can see about 5K IOPs increase on the cluster's throughput since the command started. However, I can't seem to find any indication on the progress. Nor do I see an increase in the RGW pool usage.

Anyone can suggest on the next steps please?

Cheers

Andrei

----- Original Message -----
> From: "Florian Engelmann" <florian.engelmann@xxxxxxxxxxxx>
> To: "Andreas Calminder" <andreas.calminder@xxxxxxxxxx>, "Christian Wuerdig" <christian.wuerdig@xxxxxxxxx>
> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Sent: Friday, 26 October, 2018 11:28:19
> Subject: Re:  RGW how to delete orphans

> Hi,
> 
> we've got the same problem here. Our 12.2.5 RadosGWs crashed
> (unrecognised by us) about 30.000 times with ongoing multipart uploads.
> After a couple of days we ended up with:
> 
> xx-1.rgw.buckets.data       6      N/A               N/A
> 116TiB     87.22       17.1TiB     36264870     36.26M     3.63GiB
> 148MiB       194TiB
> 
> 116TB data (194TB raw) while only:
> 
> for i in $(radosgw-admin bucket list | jq -r '.[]'); do  radosgw-admin
> bucket stats --bucket=$i | jq '.usage | ."rgw.main" | .size_kb' ; done |
> awk '{ SUM += $1} END { print SUM/1024/1024/1024 }'
> 
> 46.0962
> 
> 116 - 46 = 70TB
> 
> So 70TB of objects are orphans, right?
> 
> And there are 36.264.870 objects in our rgw.buckets.data pool.
> 
> So we started:
> 
> radosgw-admin orphans list-jobs --extra-info
> [
>     {
>         "orphan_search_state": {
>             "info": {
>                 "orphan_search_info": {
>                     "job_name": "check-orph",
>                     "pool": "zh-1.rgw.buckets.data",
>                     "num_shards": 64,
>                     "start_time": "2018-10-10 09:01:14.746436Z"
>                 }
>             },
>             "stage": {
>                 "orphan_search_stage": {
>                     "search_stage": "iterate_bucket_index",
>                     "shard": 0,
>                     "marker": ""
>                 }
>             }
>         }
>     }
> ]
> 
> writing stdout to: orphans.txt
> 
> I am not sure about how to interpret the output but:
> 
> cat orphans.txt | awk '/^storing / { SUM += $2} END { print SUM }'
> 2145042765
> 
> So how to interpret those output lines:
> ...
> storing 16 entries at orphan.scan.check-orph.linked.62
> storing 19 entries at orphan.scan.check-orph.linked.63
> storing 13 entries at orphan.scan.check-orph.linked.0
> storing 13 entries at orphan.scan.check-orph.linked.1
> ...
> 
> Is it like
> 
> "I am storing 16 'healthy' object 'names' to the shard
> orphan.scan.check-orph.linked.62"
> 
> Is it objects? What is meant by "entries"? Where are those "shards"? Are
> they files or objects in a pool? How to know about the progress of
> "orphans find"? Is the job still doing the right thing? Time estimated
> to run on SATA disks with 194TB RAW?
> 
> The orphan find command stored already 2.145.042.765 (more than 2
> billion) "entries"... while there are "only" 36 million objects...
> 
> Is the process still healthy and doing the right thing?
> 
> All the best,
> Florian
> 
> 
> 
> 
> 
> Am 10/3/17 um 10:48 AM schrieb Andreas Calminder:
>> The output, to stdout, is something like leaked: $objname. Am I supposed
>> to pipe it to a log, grep for leaked: and pipe it to rados delete? Or am
>> I supposed to dig around in the log pool to try and find the objects
>> there? The information available is quite vague. Maybe Yehuda can shed
>> some light on this issue?
>> 
>> Best regards,
>> /Andreas
>> 
>> On 3 Oct 2017 06:25, "Christian Wuerdig" <christian.wuerdig@xxxxxxxxx
>> <mailto:christian.wuerdig@xxxxxxxxx>> wrote:
>> 
>>     yes, at least that's how I'd interpret the information given in this
>>     thread:
>>     http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016521.html
>>     <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016521.html>
>> 
>>     On Tue, Oct 3, 2017 at 1:11 AM, Webert de Souza Lima
>>     <webert.boss@xxxxxxxxx <mailto:webert.boss@xxxxxxxxx>> wrote:
>>      > Hey Christian,
>>      >
>>      >> On 29 Sep 2017 12:32 a.m., "Christian Wuerdig"
>>      >> <christian.wuerdig@xxxxxxxxx
>>     <mailto:christian.wuerdig@xxxxxxxxx>> wrote:
>>      >>>
>>      >>> I'm pretty sure the orphan find command does exactly just that -
>>      >>> finding orphans. I remember some emails on the dev list where
>>     Yehuda
>>      >>> said he wasn't 100% comfortable of automating the delete just yet.
>>      >>> So the purpose is to run the orphan find tool and then delete the
>>      >>> orphaned objects once you're happy that they all are actually
>>      >>> orphaned.
>>      >>>
>>      >
>>      > so what you mean is that one should manually remove the result listed
>>      > objects that are output?
>>      >
>>      >
>>      > Regards,
>>      >
>>      > Webert Lima
>>      > DevOps Engineer at MAV Tecnologia
>>      > Belo Horizonte - Brasil
>>      >
>>      >
>>      > _______________________________________________
>>      > ceph-users mailing list
>>      > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>>      > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>      >
>>     _______________________________________________
>>     ceph-users mailing list
>>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> 
>> << ATT00001.txt (0.4KB) (0.4KB) >>
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux