Re: RGW problems after upgrade to Luminous

David Turner <drakonstein@xxxxxxxxx> · Fri, 3 Aug 2018 13:53:42 -0400

I came across you mentioning bucket check --fix before, but I totally forgot that I should be passing --bucket=mybucket with the command to actually do anything.  I'm running this now and it seems to actually be doing something.  My guess was that it was stuck in the state and now that I can clean up the bucket I should be able to try resharding it again.  Thank you so much.

On Fri, Aug 3, 2018 at 12:50 PM Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:
Oh, also -- one thing that might work is running bucket check --fix on

the bucket. That should overwrite the reshard status field in the

bucket index.

Let me know if it happens to fix the issue for you.

Yehuda.

On Fri, Aug 3, 2018 at 9:46 AM, Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:

> Is it actually resharding, or is it just stuck in that state?

>

> On Fri, Aug 3, 2018 at 7:55 AM, David Turner <drakonstein@xxxxxxxxx> wrote:

>> I am currently unable to write any data to this bucket in this current

>> state.  Does anyone have any ideas for reverting to the original index

>> shards and cancel the reshard processes happening to the bucket?

>>

>> On Thu, Aug 2, 2018 at 12:32 PM David Turner <drakonstein@xxxxxxxxx> wrote:

>>>

>>> I upgraded my last cluster to Luminous last night.  It had some very large

>>> bucket indexes on Jewel which caused a couple problems with the upgrade, but

>>> finally everything finished and we made it to the other side, but now I'm

>>> having problems with [1] these errors populating a lot of our RGW logs and

>>> clients seeing the time skew error responses.  The time stamps between the

>>> client nodes, rgw nodes, and the rest of the ceph cluster match perfectly

>>> and actually build off of the same ntp server.

>>>

>>> I tried disabling dynamic resharding for the RGW daemons by placing this

>>> in the ceph.conf for the affected daemons `rgw_dynamic_resharding = false`

>>> and restarting them as well as issuing a reshard cancel for the bucket, but

>>> nothing seems to actually stop the reshard from processing.  Here's the

>>> output of a few commands.  [2] reshard list [3] reshard status

>>>

>>> Are there any things we can do to actually disable bucket resharding or

>>> let it finish?  I'm stuck on ideas.  I've tried quite a few things I've

>>> found around except for manually resharding which is a last resort here.

>>> This bucket won't exist in a couple months and the performance is good

>>> enough without resharding, but I don't know how to get it to stop.  Thanks.

>>>

>>>

>>> [1] 2018-08-02 16:22:16.047387 7fbe82e61700  0 NOTICE: resharding

>>> operation on bucket index detected, blocking

>>> 2018-08-02 16:22:16.206950 7fbe8de77700  0 block_while_resharding ERROR:

>>> bucket is still resharding, please retry

>>> 2018-08-02 16:22:12.253734 7fbe4f5fa700  0 NOTICE: request time skew too

>>> big now=2018-08-02 16:22:12.000000 req_time=2018-08-02 16:06:03.000000

>>>

>>> [2] $ radosgw-admin reshard list

>>> [2018-08-02 16:13:19.082172 7f3ca4163c80 -1 ERROR: failed to list reshard

>>> log entries, oid=reshard.0000000010

>>> 2018-08-02 16:13:19.082757 7f3ca4163c80 -1 ERROR: failed to list reshard

>>> log entries, oid=reshard.0000000011

>>> 2018-08-02 16:13:19.083941 7f3ca4163c80 -1 ERROR: failed to list reshard

>>> log entries, oid=reshard.0000000012

>>> 2018-08-02 16:13:19.085170 7f3ca4163c80 -1 ERROR: failed to list reshard

>>> log entries, oid=reshard.0000000013

>>> 2018-08-02 16:13:19.085898 7f3ca4163c80 -1 ERROR: failed to list reshard

>>> log entries, oid=reshard.0000000014

>>> ]

>>> 2018-08-02 16:13:19.086476 7f3ca4163c80 -1 ERROR: failed to list reshard

>>> log entries, oid=reshard.0000000015

>>>

>>> [3] $ radosgw-admin reshard status --bucket my-bucket

>>> [

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     },

>>>     {

>>>         "reshard_status": 1,

>>>         "new_bucket_instance_id":

>>> "b7567cda-7d6f-4feb-86d6-bbd9da36b14d.141873449.1",

>>>         "num_shards": 32

>>>     }

>>> ]

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com