Hi, We recently learned on this list about the "rotational_journal = 1" for some (all?) NVMe / SSD setups. We also hit this issue (see below). It would eventually take a week to recover ... This was all "scratch data" so didn't matter anyway. We recently had to do some reovery / backfilling on our OSD nodes. Only large objects were stored now (rbd chunks) so reovery speed was already much better. Still we had to crank osd_max_backfills to 6, and osd_max_recovery to 3 to get some more recovery performance. TL;DR: we set osd_recovery_sleep_hdd to 0 as well as osd_recovery_sleep_hybrid to 0 and had another node recover. Already with default recovery settings performance was much better. With recovery / backfills set to 3, recovery went really fast. See [1] for some "before / after" impression. Max throughput was around 1800 MB/s, each OSD doing some 5K writes. For sure this was not the limit. We would hit max nic bandwith pretty soon though. ceph++ Gr. Stefan [1]: https://owncloud.kooman.org/s/mvbMCVLFbWjAyOn#pdfviewer Quoting Stefan Kooman (stefan@xxxxxx): > Hi, > > I know I'm not the only one with this question as I have see similar questions on this list: > How to speed up recovery / backfilling? > > Current status: > > pgs: 155325434/800312109 objects degraded (19.408%) > 1395 active+clean > 440 active+undersized+degraded+remapped+backfill_wait > 21 active+undersized+degraded+remapped+backfilling > > io: > client: 180 kB/s rd, 5776 kB/s wr, 273 op/s rd, 440 op/s wr > recovery: 2990 kB/s, 109 keys/s, 114 objects/s > > What we did? Shutdown one DC. Fill cluster with loads of objects, turn > DC back on (size = 3, min_size=2). To test exactly this: recovery. > > I have been going trough all the recovery options (including legacy) but > I cannot get the recovery speed to increase: > > osd_recovery_op_priority 63 > osd_client_op_priority 3 > > ^^ yup, reversed those, to no avail > > osd_recovery_max_active 10' > > ^^ This helped for a short period of time, and then it went back to > "slow" mode > > osd_recovery_max_omap_entries_per_chunk 0 > osd_recovery_max_chunk 67108864 > > Haven't seen any change in recovery speed. > > osd_recovery_sleep_ssd": "0.000000 > ^^ default for SSD Didn't think about hdd / hybrid setting, as we have all SSD. Gr. Stefan -- | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com