11 pools (7 just for radosgw usage, 2 for cephfs, and a couple of others). All replication (3 copy, 2 min). The pg ratio is healthy, ceph is not complaining about it. Most of the pools have 256 pg, but the big cephfs data/metadata has 2048. On Thu, Jun 28, 2018 at 4:24 PM, Anthony D'Atri <aad@xxxxxxxxxxxxxx> wrote: > > In the past I had a couple of non-prod clusters with as few as 50 similar HDD-colo OSDs. Convergence was definitely slower than in the 450-OSD production clusters, but was faster than this even with the usual set of throttling parameters set all the way down to 1. Do you have multiple pools? What's your PG ratio? Replication or EC? I've seen that the amplification dynamics of EC have an impact on recovery parallelism. > > >> >> On Thu, Jun 28, 2018 at 3:53 PM, Michael Lowe <j.michael.lowe@xxxxxxxxx> wrote: >>> I tend to bump up the number of backfills, one is the default. In my >>> environment 16 is the sweet spot. >>> >>> ceph tell osd.* injectargs '--osd-max-backfills 1' >>> >>> >>> >> >> >> Did that, but its not helping much. The recovery rate is pretty slow >> (our data network is 10GB). >> >> recovery: 208 MB/s, 97 objects/s >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html