Re: Ceph RGW Index Sharding In Jewel

David Turner <drakonstein@xxxxxxxxx> · Wed, 22 Aug 2018 23:48:07 -0400

The release notes for 0.94.10 mention the introduction of the `radosgw-admin bucket reshard` command. Redhat [1] documentation for their Enterprise version of Jewel goes into detail for the procedure. You can also search the ML archives for the command to find several conversations about the process as well as problems.  Make sure that the procedure works on a test bucket for Hammer before attempting it on your 12M object bucket.

[1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/object_gateway_guide_for_ubuntu/administration_cli#rados_gateway_user_management

On Wed, Aug 22, 2018, 9:23 PM Russell Holloway <russell.holloway@xxxxxxxxxxx> wrote:

Did I say Jewel? I was too hopeful. I meant hammer. This particular cluster is hammer :(

-Russ

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Russell Holloway <russell.holloway@xxxxxxxxxxx>

Sent: Wednesday, August 22, 2018 8:49:19 PM

To: ceph-users@xxxxxxxxxxxxxx

Subject:  Ceph RGW Index Sharding In Jewel

So, I've finally journeyed deeper into the depths of ceph and discovered a grand mistake that is likely the root cause of many woeful nights of blocked requests. To start off, I'm running jewel, and I know that is dated
 and I need to upgrade (if anyone knows if this is a seamless upgrade even though several major versions behind, do let me know.

My current issue is due to a rgw bucket index. I have just discovered I have a bucket with about 12M objects in it. Sharding is not enabled on it. And it's on a spinning disk, not SSD (journal is SSD though, so it could
 be worse?). A bad combination as I just learned. From my recent understanding, in jewel I could maybe update the rgw region to set max shards for buckets, but it also sounds like this may or may not affect my existing bucket. Furthermore, somewhere I saw mention
 that prior to luminous, resharding needed to be done offline. I haven't found any documentation on this process though. There is some mention around putting bucket indexes on SSD for performance and latency reasons, which sounds great, but I get the feeling
 if I modified crush map and tried to get the index pool on SSDs, and tried to start moving things around involving this PG, it will fail in the same way I can't even do a deep scrub on the PG.

Does anyone have a good reference on how I could begin to clean this bucket up or get it sharded while on jewel? Again, it sounds like in Luminous it may just start resharding itself and fix itself right up, but I feel
 going to luminous will require more work and testing (mostly due to my original deployment tool Fuel 8 for openstack, bound to jewel, and no easy upgrade path for fuel...I'll have to sort out how to transition away from that while maintaining my existing nodes)

The core issue was identified when I took finer grained control over deep scrubs and trigger them manually. I eventually found out I could trigger my entire ceph cluster to hang by triggering a deep scrub on a single
 PG, which happens to be the one hosting this index. The OSD hosting it basically becomes unresponsive for a very long time and begins blocking a lot of other requests affecting all sorts of VMs using rbd. I could simply not deep scrub this PG (ceph ends up
 marking OSD as down and deep scrub seems to fail, never completes, and about 30 minutes after hung requests, cluster eventually recovers), but I know I need to address this bucket sizing issue and then try to work on upgrading ceph.

Is it doable? For what it's worth, I tried to list the keys in ceph with rados and that also hung requests. I'm not quite sure how to break the bucket up at a software level especially if I cannot list the contents,
 so I hope within ceph there is some route forward here...

Thanks a bunch in advance for helping a naive ceph operator.

-Russ

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com