Re: Ceph RGW Index Sharding In Jewel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Did I say Jewel? I was too hopeful. I meant hammer. This particular cluster is hammer :(


-Russ


From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Russell Holloway <russell.holloway@xxxxxxxxxxx>
Sent: Wednesday, August 22, 2018 8:49:19 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Ceph RGW Index Sharding In Jewel
 

So, I've finally journeyed deeper into the depths of ceph and discovered a grand mistake that is likely the root cause of many woeful nights of blocked requests. To start off, I'm running jewel, and I know that is dated and I need to upgrade (if anyone knows if this is a seamless upgrade even though several major versions behind, do let me know.


My current issue is due to a rgw bucket index. I have just discovered I have a bucket with about 12M objects in it. Sharding is not enabled on it. And it's on a spinning disk, not SSD (journal is SSD though, so it could be worse?). A bad combination as I just learned. From my recent understanding, in jewel I could maybe update the rgw region to set max shards for buckets, but it also sounds like this may or may not affect my existing bucket. Furthermore, somewhere I saw mention that prior to luminous, resharding needed to be done offline. I haven't found any documentation on this process though. There is some mention around putting bucket indexes on SSD for performance and latency reasons, which sounds great, but I get the feeling if I modified crush map and tried to get the index pool on SSDs, and tried to start moving things around involving this PG, it will fail in the same way I can't even do a deep scrub on the PG.


Does anyone have a good reference on how I could begin to clean this bucket up or get it sharded while on jewel? Again, it sounds like in Luminous it may just start resharding itself and fix itself right up, but I feel going to luminous will require more work and testing (mostly due to my original deployment tool Fuel 8 for openstack, bound to jewel, and no easy upgrade path for fuel...I'll have to sort out how to transition away from that while maintaining my existing nodes)


The core issue was identified when I took finer grained control over deep scrubs and trigger them manually. I eventually found out I could trigger my entire ceph cluster to hang by triggering a deep scrub on a single PG, which happens to be the one hosting this index. The OSD hosting it basically becomes unresponsive for a very long time and begins blocking a lot of other requests affecting all sorts of VMs using rbd. I could simply not deep scrub this PG (ceph ends up marking OSD as down and deep scrub seems to fail, never completes, and about 30 minutes after hung requests, cluster eventually recovers), but I know I need to address this bucket sizing issue and then try to work on upgrading ceph.


Is it doable? For what it's worth, I tried to list the keys in ceph with rados and that also hung requests. I'm not quite sure how to break the bucket up at a software level especially if I cannot list the contents, so I hope within ceph there is some route forward here...


Thanks a bunch in advance for helping a naive ceph operator.


-Russ

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux