Re: rgw resharding operation seemingly won't end

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the response Yehuda. 


Staus:
[root@objproxy02 UMobjstore]# radosgw-admin reshard status —bucket=$bucket_name
[
    {
        "reshard_status": 1,
        "new_bucket_instance_id": "8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.47370206.1",
        "num_shards": 4
    }
]

I cleared the flag using the bucket check —fix command and will keep an eye on that tracker issue. 

Do you have any insight into why the RGWs ultimately paused/reloaded and failed to come back? I am happy to provide more information that could assist. At the moment we are somewhat nervous to reenable dynamic sharding as it seems to have contributed to this problem. 

Thanks,
Ryan



> On Oct 9, 2017, at 5:26 PM, Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:
> 
> On Mon, Oct 9, 2017 at 1:59 PM, Ryan Leimenstoll
> <rleimens@xxxxxxxxxxxxxx> wrote:
>> Hi all,
>> 
>> We recently upgraded to Ceph 12.2.1 (Luminous) from 12.2.0 however are now seeing issues running radosgw. Specifically, it appears an automatically triggered resharding operation won’t end, despite the jobs being cancelled (radosgw-admin reshard cancel). I have also disabled dynamic sharding for the time being in the ceph.conf.
>> 
>> 
>> [root@objproxy02 ~]# radosgw-admin reshard list
>> []
>> 
>> The two buckets were also reported in the `radosgw-admin reshard list` before our RGW frontends paused recently (and only came back after a service restart). These two buckets cannot currently be written to at this point either.
>> 
>> 2017-10-06 22:41:19.547260 7f90506e9700 0 block_while_resharding ERROR: bucket is still resharding, please retry
>> 2017-10-06 22:41:19.547411 7f90506e9700 0 WARNING: set_req_state_err err_no=2300 resorting to 500
>> 2017-10-06 22:41:19.547729 7f90506e9700 0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Input/output error
>> 2017-10-06 22:41:19.548570 7f90506e9700 1 ====== req done req=0x7f90506e3180 op status=-2300 http_status=500 ======
>> 2017-10-06 22:41:19.548790 7f90506e9700 1 civetweb: 0x55766d111000: $MY_IP_HERE$ - - [06/Oct/2017:22:33:47 -0400] "PUT /
>> $REDACTED_BUCKET_NAME$/$REDACTED_KEY_NAME$ HTTP/1.1" 1 0 - Boto3/1.4.7 Python/2.7.12 Linux/4.9.43-17.3
>> 9.amzn1.x86_64 exec-env/AWS_Lambda_python2.7 Botocore/1.7.2 Resource
>> [.. slightly later in the logs..]
>> 2017-10-06 22:41:53.516272 7f90406c9700 1 rgw realm reloader: Frontends paused
>> 2017-10-06 22:41:53.528703 7f907893f700 0 ERROR: failed to clone shard, completion_mgr.get_next() returned ret=-125
>> 2017-10-06 22:44:32.049564 7f9074136700 0 ERROR: keystone revocation processing returned error r=-22
>> 2017-10-06 22:59:32.059222 7f9074136700 0 ERROR: keystone revocation processing returned error r=-22
>> 
>> Can anyone advise on the best path forward to stop the current sharding states and avoid this moving forward?
>> 
> 
> What does 'radosgw-admin reshard status --bucket=<bucket>' return?
> I think just manually resharding the buckets should clear this flag,
> is that not an option?
> manual reshard: radosgw-admin bucket reshard --bucket=<bucket>
> --num-shards=<num>
> 
> also, the 'radosgw-admin bucket check --fix' might clear that flag.
> 
> For some reason it seems that the reshard cancellation code is not
> clearing that flag on the bucket index header (pretty sure it used to
> do it at one point). I'll open a tracker ticket.
> 
> Thanks,
> Yehuda
> 
>> 
>> Some other details:
>> - 3 rgw instances
>> - Ceph Luminous 12.2.1
>> - 584 active OSDs, rgw bucket index is on Intel NVMe OSDs
>> 
>> 
>> Thanks,
>> Ryan Leimenstoll
>> rleimens@xxxxxxxxxxxxxx
>> University of Maryland Institute for Advanced Computer Studies
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux