Re: RGW how to delete orphans

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we've got the same problem here. Our 12.2.5 RadosGWs crashed (unrecognised by us) about 30.000 times with ongoing multipart uploads. After a couple of days we ended up with:

xx-1.rgw.buckets.data 6 N/A N/A 116TiB 87.22 17.1TiB 36264870 36.26M 3.63GiB 148MiB 194TiB

116TB data (194TB raw) while only:

for i in $(radosgw-admin bucket list | jq -r '.[]'); do radosgw-admin bucket stats --bucket=$i | jq '.usage | ."rgw.main" | .size_kb' ; done | awk '{ SUM += $1} END { print SUM/1024/1024/1024 }'

46.0962

116 - 46 = 70TB

So 70TB of objects are orphans, right?

And there are 36.264.870 objects in our rgw.buckets.data pool.

So we started:

radosgw-admin orphans list-jobs --extra-info
[
    {
        "orphan_search_state": {
            "info": {
                "orphan_search_info": {
                    "job_name": "check-orph",
                    "pool": "zh-1.rgw.buckets.data",
                    "num_shards": 64,
                    "start_time": "2018-10-10 09:01:14.746436Z"
                }
            },
            "stage": {
                "orphan_search_stage": {
                    "search_stage": "iterate_bucket_index",
                    "shard": 0,
                    "marker": ""
                }
            }
        }
    }
]

writing stdout to: orphans.txt

I am not sure about how to interpret the output but:

cat orphans.txt | awk '/^storing / { SUM += $2} END { print SUM }'
2145042765

So how to interpret those output lines:
...
storing 16 entries at orphan.scan.check-orph.linked.62
storing 19 entries at orphan.scan.check-orph.linked.63
storing 13 entries at orphan.scan.check-orph.linked.0
storing 13 entries at orphan.scan.check-orph.linked.1
...

Is it like

"I am storing 16 'healthy' object 'names' to the shard orphan.scan.check-orph.linked.62"

Is it objects? What is meant by "entries"? Where are those "shards"? Are they files or objects in a pool? How to know about the progress of "orphans find"? Is the job still doing the right thing? Time estimated to run on SATA disks with 194TB RAW?

The orphan find command stored already 2.145.042.765 (more than 2 billion) "entries"... while there are "only" 36 million objects...

Is the process still healthy and doing the right thing?

All the best,
Florian





Am 10/3/17 um 10:48 AM schrieb Andreas Calminder:
The output, to stdout, is something like leaked: $objname. Am I supposed to pipe it to a log, grep for leaked: and pipe it to rados delete? Or am I supposed to dig around in the log pool to try and find the objects there? The information available is quite vague. Maybe Yehuda can shed some light on this issue?

Best regards,
/Andreas

On 3 Oct 2017 06:25, "Christian Wuerdig" <christian.wuerdig@xxxxxxxxx <mailto:christian.wuerdig@xxxxxxxxx>> wrote:

    yes, at least that's how I'd interpret the information given in this
    thread:
    http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016521.html
    <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016521.html>

    On Tue, Oct 3, 2017 at 1:11 AM, Webert de Souza Lima
    <webert.boss@xxxxxxxxx <mailto:webert.boss@xxxxxxxxx>> wrote:
     > Hey Christian,
     >
     >> On 29 Sep 2017 12:32 a.m., "Christian Wuerdig"
     >> <christian.wuerdig@xxxxxxxxx
    <mailto:christian.wuerdig@xxxxxxxxx>> wrote:
     >>>
     >>> I'm pretty sure the orphan find command does exactly just that -
     >>> finding orphans. I remember some emails on the dev list where
    Yehuda
     >>> said he wasn't 100% comfortable of automating the delete just yet.
     >>> So the purpose is to run the orphan find tool and then delete the
     >>> orphaned objects once you're happy that they all are actually
     >>> orphaned.
     >>>
     >
     > so what you mean is that one should manually remove the result listed
     > objects that are output?
     >
     >
     > Regards,
     >
     > Webert Lima
     > DevOps Engineer at MAV Tecnologia
     > Belo Horizonte - Brasil
     >
     >
     > _______________________________________________
     > ceph-users mailing list
     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
     >
    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

<< ATT00001.txt (0.4KB) (0.4KB) >>

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux