Parallelism. The backfill/recovery tunables control how many recovery ops a given OSD will perform. If you’re adding a new OSD, naturally it is the bottleneck. For other forms of data movement, early on one has multiple OSDs reading and writing independently. Toward the end, increasingly fewer OSDs still have work to do, so there’s a long tail as they complete their queues. > On Oct 11, 2019, at 4:42 AM, Eugen Block <eblock@xxxxxx> wrote: > > Yeah we also noticed decreasing recovery speed if it comes to the last PGs, but we never put up a theory. I think your explanation makes sense. Next time I'll try with much higher values, thanks for sharing that. > > Regards, > Eugen > > > Zitat von Frank Schilder <frans@xxxxxx>: > >> I did a lot of data movement lately and my observation is, that backfill is very fast (high bandwidth and many thousand keys/s) as long as this is many-to-many OSDs. The number of OSD participating slowly decreases over time until there is only 1 disk left that is written to. This becomes really slow, because the recovery options are for keeping all-to-all under control. >> >> In such a case, you might want to temporarily increase these numbers to something really high (not 10 or 20, but 1000 or 2000; increase in steps) until the single-disk write is over and then set it back again. With SSD this should be OK. >> >> Best regards, >> >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >> ________________________________________ >> From: Eugen Block <eblock@xxxxxx> >> Sent: 11 October 2019 10:24 >> To: Frank Schilder >> Cc: ceph-users@xxxxxxx >> Subject: Re: Nautilus: PGs stuck remapped+backfilling >> >>> You meta data PGs *are* backfilling. It is the "61 keys/s" statement >>> in the ceph status output in the recovery I/O line. If this is too >>> slow, increase osd_max_backfills and osd_recovery_max_active. >>> >>> Or just have some coffee ... >> >> >> I already had increased osd_max_backfills and osd_recovery_max_active >> in order to speed things up, and most of the PGs were remapped pretty >> quick (couple of minutes), but these last 3 PGs took almost two hours >> to complete, which was unexpected. > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx