Re: ceph-users Digest, Vol 81, Issue 39 Re:RadosGW cant list objects when there are too many of them

Romit Misra <romit.misra@xxxxxxxxxxxx> · Thu, 17 Oct 2019 18:00:01 +0530

Hi Arash,
  If the number of objects in a bucket are too large in the order of millions, a paginated listing approach works better.
There are also ceratin RGW configs, that controls on how big a RGW response (in terms of number objects can be, by default I believe this is 1000)
The code for Paginated Listing (Snippet can be modified):-

"
try:
         buckethandle = s3_conn_src.get_bucket(bucket_name)
         while True:
             keys = buckethandle.get_all_keys(max_keys=1000,marker = marker)
             for k in keys:                     #do operation on keys (which are the objects)
                     print k.name
                    #update marker
                    marker = k.name
             if keys.is_truncated is False:
                 print "Breaking"
                 break
except Exception, e:
          print e
"

Thanks
Romit Misra

Thanks
Romit

On Thu, Oct 17, 2019 at 4:18 PM <ceph-users-request@xxxxxxx> wrote:
Send ceph-users mailing list submissions to

        ceph-users@xxxxxxx

To subscribe or unsubscribe via email, send a message with subject or

body 'help' to

        ceph-users-request@xxxxxxx

You can reach the person managing the list at

        ceph-users-owner@xxxxxxx

When replying, please edit your Subject line so it is more specific

than "Re: Contents of ceph-users digest..."

Today's Topics:

   1. RadosGW cant list objects when there are too many of them 

      (Arash Shams)

   2. Re: Recovering from a Failed Disk (replication 1) (Burkhard Linke)

   3. Re: RGW blocking on large objects (Paul Emmerich)

   4. Re: RadosGW cant list objects when there are too many of them

      (Paul Emmerich)

   5. Re: Recovering from a Failed Disk (replication 1) (Frank Schilder)

----------------------------------------------------------------------

Date: Thu, 17 Oct 2019 07:19:12 +0000

From: Arash Shams <ara4sh@xxxxxxxxxxx>

Subject:  RadosGW cant list objects when there are too

        many of them

To: "ceph-users@xxxxxxx" <ceph-users@xxxxxxx>

Message-ID:  <LNXP265MB0508FF1F47CB5EA9C29219FA926D0@xxxxxxxxxxxxxxxxx

        P265.PROD.OUTLOOK.COM>

Content-Type: multipart/alternative;    boundary="_000_LNXP265MB0508FF

        1F47CB5EA9C29219FA926D0LNXP265MB0508GBRP_"

--_000_LNXP265MB0508FF1F47CB5EA9C29219FA926D0LNXP265MB0508GBRP_

Content-Type: text/plain; charset="iso-8859-1"

Content-Transfer-Encoding: quoted-printable

Dear All

I have a bucket with 5 million Objects and I cant list objects with

radosgw-admin bucket list --bucket=3Dbucket | jq .[].name

or listing files using boto3

    s3 =3D boto3.client('s3',

                      endpoint_url=3Dcredentials['endpoint_url'],

                      aws_access_key_id=3Dcredentials['access_key'],

                      aws_secret_access_key=3Dcredentials['secret_key'])

    response =3D s3.list_objects_v2(Bucket=3Dbucket_name)

    for item in response['Contents']:

        print(item['Key'])

what is the solution ? how can I find list of my objects ?

--_000_LNXP265MB0508FF1F47CB5EA9C29219FA926D0LNXP265MB0508GBRP_

Content-Type: text/html; charset="iso-8859-1"

Content-Transfer-Encoding: quoted-printable

<html>

<head>

<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=

1">

<style type=3D"text/css" style=3D"display:none;"> P {margin-top:0;margin-bo=

ttom:0;} </style>

</head>

<body dir=3D"ltr">

<div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;=

 color: rgb(0, 0, 0);">

Dear All <br>

</div>

<div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;=

 color: rgb(0, 0, 0);">

<br>

</div>

<div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;=

 color: rgb(0, 0, 0);">

I have a bucket with 5 million Objects and I cant list objects with <br>

radosgw-admin bucket list --bucket=3Dbucket | jq .[].name<br>

or listing files using boto3 <br>

</div>

<div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;=

 color: rgb(0, 0, 0);">

<span><br>

</span></div>

<div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;=

 color: rgb(0, 0, 0);">

<span>&nbsp; &nbsp; s3 =3D boto3.client('s3',<br>

</span>

<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =

&nbsp; endpoint_url=3Dcredentials['endpoint_url'],<br>

</div>

<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =

&nbsp; aws_access_key_id=3Dcredentials['access_key'],<br>

</div>

<div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =

&nbsp; aws_secret_access_key=3Dcredentials['secret_key'])<br>

</div>

<div><br>

</div>

<div>&nbsp; &nbsp; response =3D s3.list_objects_v2(Bucket=3Dbucket_name)<br=

>

</div>

<div>&nbsp; &nbsp; for item in response['Contents']:<br>

</div>

<span>&nbsp; &nbsp; &nbsp; &nbsp; print(item['Key'])</span><br>

</div>

<div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;=

 color: rgb(0, 0, 0);">

<br>

</div>

<div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;=

 color: rgb(0, 0, 0);">

what is the solution ? how can I find list of my objects ?</div>

<div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;=

 color: rgb(0, 0, 0);">

<br>

</div>

<div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;=

 color: rgb(0, 0, 0);">

<br>

</div>

<div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: 12pt;=

 color: rgb(0, 0, 0);">

<br>

</div>

</body>

</html>

--_000_LNXP265MB0508FF1F47CB5EA9C29219FA926D0LNXP265MB0508GBRP_--

------------------------------

Date: Thu, 17 Oct 2019 10:18:11 +0200

From: Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>

Subject:  Re: Recovering from a Failed Disk (replication

        1)

To: ceph-users@xxxxxxx

Message-ID: <f341533b-cc7a-865d-0440-79084e0c5707@xxxxxxxxxxxxxxxxxxxx

        i-giessen.de>

Content-Type: multipart/alternative;

        boundary="------------71A0D501B0D56489A2F673CA"

This is a multi-part message in MIME format.

--------------71A0D501B0D56489A2F673CA

Content-Type: text/plain; charset=utf-8; format=flowed

Content-Transfer-Encoding: 7bit

Hi,

On 10/17/19 5:56 AM, Ashley Merrick wrote:

> I think your better off doing the DD method, you can export and import 

> a PG at a time (ceph-objectstore-tool)

>

> But if the disk is failing a DD is probably your best method.

In case of hardware problems or broken sectors, I would recommend 

'dd_rescue' instead of dd. It can handle broken sectors, automatic 

retries, skipping etc.

You will definitely need a second disk to rescue to.

Regards,

Burkhard

--------------71A0D501B0D56489A2F673CA

Content-Type: text/html; charset=utf-8

Content-Transfer-Encoding: 7bit

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Hi,</p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 10/17/19 5:56 AM, Ashley Merrick

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="" href="mailto:mid%3A16dd7dc0f88.e12acd11469078.7469588663842930162@xxxxxxxxxxxxxx" target="_blank">mid:16dd7dc0f88.e12acd11469078.7469588663842930162@xxxxxxxxxxxxxx">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <div style="font-family: Verdana, Arial, Helvetica, sans-serif;

        font-size: 10pt;">I think your better off doing the DD method,

        you can export and import a PG at a time (ceph-objectstore-tool)<br>

        <br>

        But if the disk is failing a DD is probably your best method.<br>

      </div>

    </blockquote>

    <p><br>

    </p>

    <p>In case of hardware problems or broken sectors, I would recommend

      'dd_rescue' instead of dd. It can handle broken sectors, automatic

      retries, skipping etc.</p>

    <p><br>

    </p>

    <p>You will definitely need a second disk to rescue to.</p>

    <p><br>

    </p>

    <p>Regards,</p>

    <p>Burkhard</p>

    <br>

  </body>

</html>

--------------71A0D501B0D56489A2F673CA--

------------------------------

Date: Thu, 17 Oct 2019 11:50:37 +0200

From: Paul Emmerich <paul.emmerich@xxxxxxxx>

Subject:  Re: RGW blocking on large objects

To: Robert LeBlanc <robert@xxxxxxxxxxxxx>

Cc: ceph-users <ceph-users@xxxxxxx>

Message-ID:

        <CAD9yTbFSpxo1cMAAZ56YcwKxY2dcv0KRPcWYapT2fLmYhmrLkg@xxxxxxxxxxxxxx>

Content-Type: text/plain; charset="UTF-8"

On Thu, Oct 17, 2019 at 12:17 AM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:

>

> On Wed, Oct 16, 2019 at 2:50 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:

> >

> > On Wed, Oct 16, 2019 at 11:23 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:

> > >

> > > On Tue, Oct 15, 2019 at 8:05 AM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:

> > > >

> > > > On Mon, Oct 14, 2019 at 2:58 PM Paul Emmerich <paul.emmerich@xxxxxxxx> wrote:

> > > > >

> > > > > Could the 4 GB GET limit saturate the connection from rgw to Ceph?

> > > > > Simple to test: just rate-limit the health check GET

> > > >

> > > > I don't think so, we have dual 25Gbp in a LAG, so Ceph to RGW has

> > > > multiple paths, but we aren't balancing on port yet, so RGW to HAProxy

> > > > is probably limited to one link.

> > > >

> > > > > Did you increase "objecter inflight ops" and "objecter inflight op bytes"?

> > > > > You absolutely should adjust these settings for large RGW setups,

> > > > > defaults of 1024 and 100 MB are way too low for many RGW setups, we

> > > > > default to 8192 and 800MB

> > >

> > > On Nautilus the defaults already seem to be:

> > > objecter_inflight_op_bytes                                 104857600

> > >                       default

> > = 100MiB

> >

> > > objecter_inflight_ops                                      24576

> > >                       default

> >

> > not sure where you got this from, but the default is still 1024 even

> > in master: https://github.com/ceph/ceph/blob/4774808cb2923f65f6919fe8be5f98917075cdd7/src/common/options.cc#L2288

>

> Looks like it is overridden in

> https://github.com/ceph/ceph/blob/4774808cb2923f65f6919fe8be5f98917075cdd7/src/rgw/rgw_main.cc#L187

you are right, this is new in Nautilus. Last time I had to play around

with these settings was indeed on a Mimic deployment.

> I'm just not

> understanding how your suggestions would help, the problem doesn't

> seem to be on the RADOS side (which it appears your tweaks target),

> but on the HTTP side as an HTTP health check takes a long time to come

> back when a big transfer is going on.

I was guessing a bottleneck on the RADOS side because you mentioned

that you tried both civetweb and beast, somewhat unlikely to run into

the exact same problem with both

-- 

Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH

Freseniusstr. 31h

81247 München

www.croit.io

Tel: +49 89 1896585 90

------------------------------

Date: Thu, 17 Oct 2019 12:00:20 +0200

From: Paul Emmerich <paul.emmerich@xxxxxxxx>

Subject:  Re: RadosGW cant list objects when there are too

        many of them

To: Arash Shams <ara4sh@xxxxxxxxxxx>

Cc: "ceph-users@xxxxxxx" <ceph-users@xxxxxxx>

Message-ID:

        <CAD9yTbGoCGPh=Ba5tQqBvCLu9uUKo0KmjR9E3ayo6DkR-E2bxQ@xxxxxxxxxxxxxx>

Content-Type: text/plain; charset="UTF-8"

Listing large buckets is slow due to S3 ordering requirements, it's

approximately O(n^2).

However, I wouldn't consider 5M to be a large bucket, it should go to

only ~50 shards which should still perform reasonable. How fast are

your metadata OSDs?

Try --allow-unordered in radosgw-admin to get an unordered result

which is only O(n) as you'd expect.

For boto3: I'm not sure if v2 object listing is available yet (I think

it has only been merged into master but has not yet made it into a

release?). It doesn't support unordered listing but there has been

some work to implement it there, not sure about the current state.

Paul

-- 

Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH

Freseniusstr. 31h

81247 München

www.croit.io

Tel: +49 89 1896585 90

On Thu, Oct 17, 2019 at 9:19 AM Arash Shams <ara4sh@xxxxxxxxxxx> wrote:

>

> Dear All

>

> I have a bucket with 5 million Objects and I cant list objects with

> radosgw-admin bucket list --bucket=bucket | jq .[].name

> or listing files using boto3

>

>     s3 = boto3.client('s3',

>                       endpoint_url=credentials['endpoint_url'],

>                       aws_access_key_id=credentials['access_key'],

>                       aws_secret_access_key=credentials['secret_key'])

>

>     response = s3.list_objects_v2(Bucket=bucket_name)

>     for item in response['Contents']:

>         print(item['Key'])

>

> what is the solution ? how can I find list of my objects ?

>

>

>

> _______________________________________________

> ceph-users mailing list -- ceph-users@xxxxxxx

> To unsubscribe send an email to ceph-users-leave@xxxxxxx

------------------------------

Date: Thu, 17 Oct 2019 10:46:53 +0000

From: Frank Schilder <frans@xxxxxx>

Subject:  Re: Recovering from a Failed Disk (replication

        1)

To: vladimir franciz blando <vladimir.blando@xxxxxxxxx>,

        "ceph-users@xxxxxxx"   <ceph-users@xxxxxxx>

Message-ID: <58e22bc6345b48718dbce7238a06e35d@xxxxxx>

Content-Type: text/plain; charset="utf-8"

You probably need to attempt a physical data rescue. Data access will be lost until done.

First thing is shut down the OSD to avoid any further damage to the disk.

Second thing is to try ddrescue, repair data on a copy if possible and then create a clone on a new disk from the copy.

If this doesn't help and you really need that last bit of data, you might need support from one of those companies that restore disk data with electron microscopy.

I successfully transferred OSDs between disks using ddrescue.

Best regards,

=================

Frank Schilder

AIT Risø Campus

Bygning 109, rum S14

________________________________________

From: vladimir franciz blando <vladimir.blando@xxxxxxxxx>

Sent: 17 October 2019 05:29:13

To: ceph-users@xxxxxxx

Subject:  Recovering from a Failed Disk (replication 1)

Hi,

I have a not ideal setup on one of my cluster,  3 ceph  nodes but using replication 1 on all pools (don't ask me why replication 1, it's a long story).

So it has come to this situation that a disk keeps on crashing, possible a hardware failure and I need to recover from that.

What's my best option for me to recover the data from the failed disk and transfer it to the other healthy disks?

This cluster is using Firefly

- Vlad

[https://mailfoogae.appspot.com/t?sender=admxhZGltaXIuYmxhbmRvQGdtYWlsLmNvbQ%3D%3D&type=zerocontent&guid=976ce724-3894-4a75-b591-dca017bdf19e]ᐧ

------------------------------

Subject: Digest Footer

_______________________________________________

ceph-users mailing list -- ceph-users@xxxxxxx

To unsubscribe send an email to ceph-users-leave@xxxxxxx

%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

------------------------------

End of ceph-users Digest, Vol 81, Issue 39

******************************************

-----------------------------------------------------------------------------------------
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the organization. Any information on shares, debentures or similar instruments, recommended product pricing, valuations and the like are for information purposes only. It is not meant to be an instruction or recommendation, as the case may be, to buy or to sell securities, products, services nor an offer to buy or sell securities, products or services unless specifically stated to be so on behalf of the Flipkart group. Employees of the Flipkart group of companies are expressly required not to make defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by email communications. Any such communication is contrary to organizational policy and outside the scope of the employment of the individual concerned. The organization will not accept any liability in respect of such communication, and the employee responsible will be personally liable for any damages or other liability arising.

Our organization accepts no liability for the content of this email, or for the consequences of any actions taken on the basis of the information provided, unless that information is subsequently confirmed in writing. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
-----------------------------------------------------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx