Bug in RadosGW resharding? Hangs again...

Martin Emrich <martin.emrich@xxxxxxxxxxx> · Mon, 15 Jan 2018 17:04:08 +0100

Hi!

After having a completely broken radosgw setup due to damaged buckets, I 
completely deleted all rgw pools, and started from scratch.

But my problem is reproducible. After pushing ca. 100000 objects into a 
bucket, the resharding process appears to start, and the bucket is now 
unresponsive.

I just see lots of these messages in all rgw logs:

2018-01-15 16:57:45.108826 7fd1779b1700  0 block_while_resharding ERROR: 
bucket is still resharding, please retry
2018-01-15 16:57:45.119184 7fd1779b1700  0 NOTICE: resharding operation 
on bucket index detected, blocking
2018-01-15 16:57:45.260751 7fd1120e6700  0 block_while_resharding ERROR: 
bucket is still resharding, please retry
2018-01-15 16:57:45.280410 7fd1120e6700  0 NOTICE: resharding operation 
on bucket index detected, blocking
2018-01-15 16:57:45.300775 7fd15b979700  0 block_while_resharding ERROR: 
bucket is still resharding, please retry
2018-01-15 16:57:45.300971 7fd15b979700  0 WARNING: set_req_state_err 
err_no=2300 resorting to 500
2018-01-15 16:57:45.301042 7fd15b979700  0 ERROR: 
RESTFUL_IO(s)->complete_header() returned err=Input/output error

One radosgw process and two OSDs housing the bucket index/metadata are 
still busy, but it seems to be stuck again.

How long is this resharding process supposed to take? I cannot believe 
that an application is supposed to block for more than half an hour...

I feel inclined to open a bug report, but I am yet unshure where the 
problem lies.

Some information:

* 3 RGW processes, 3 OSD hosts with 12 HDD OSDs and 6 SSD OSDs
* Ceph 12.2.2
* Auto-Resharding on, Bucket Versioning & Lifecycle rule enabled.

Thanks,

Martin

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com