I personally use the ceph-objectstore-tool to split the subfolders of an offline osd. I like setting filestore_merge_threshold to a negative number so that subfolders don't merge automatically and then running something like this [1] script that I put together a while ago.
Any time you backfill from losing or adding storage it will undo such a drastic splitting because it will build the PG on the new OSD with your production settings, but this would definitely hit the vast majority of your subfolder splitting. You just take down a failure domain, run something like this script on each OSD in that failure domain, start them back up, wait for backfilling, move on to the next.
As far as increasing your PG count to avoid splitting, that is a viable option as long as you feel good with your pg per osd count. If you do go this route, you can still do a manual, offline subfolder split before you reach the scheduled time for doubling your PGs.
[1] https://gist.github.com/drakonstein/cb76c7696e65522ab0e699b7ea1ab1c4
On Fri, Jul 27, 2018, 3:33 PM Aaron Bassett <Aaron.Bassett@xxxxxxxxxxxxx> wrote:
Happy friday everyone!
I have a large cluster running luminous but on filestore:
services:
osd: 1107 osds: 1031 up, 1030 in
flags noscrub,nodeep-scrub
rgw: 5 daemons active
data:
pools: 21 pools, 16384 pgs
objects: 576M objects, 2288 TB
usage: 3445 TB used, 4044 TB / 7489 TB avail
pgs: 16339 active+clean
45 active+clean+scrubbing+deep
I'm at about 46/% usage and my pgs are starting to split, which, naturally is causing lots of performance problems on the cluster. My instinct is to increase my pg count in order to give them more room and ease up on the splitting. I understand that's going to move a lot of data around, but I'm ok with it as it'll be a one time event and i can plan around it, as opposed to the splits firing off randomly and indefinitely.
So I'm looking for both a second opinion as to if this is a good course of action, and if so, if there's a clever way to go about it so as to avoid hurting the cluster too bad during recovery.
FWIW I have another older cluster on similar hardware running Jewel + Filestore: 55168 pgs, 27 pools, 3546 TB data, 945 Mobjects. It's running fine, and I've had it flirting with 80% capacity which is why I feel good about the increasing pgs approach vs other things like forcing splits to get them out of the way.
Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended recipient and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, any disclosure, distribution or other use of this e-mail message or attachments is prohibited. If you have received this e-mail message in error, please delete and notify the sender immediately. Thank you.
_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com
_______________________________________________ Ceph-large mailing list Ceph-large@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com