On Thu, Oct 17, 2019 at 12:08 PM huxiaoyu@xxxxxxxxxxxx <huxiaoyu@xxxxxxxxxxxx> wrote: > > I happened to find a note that you wrote in Nov 2015: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-November/006173.html > and I believe this is what i just hit exactly the same behavior : a host down will badly take the client performance down 1/10 (with 200MB/s recovery workload) and then took ten minutes to get good control of OSD recovery. > > Could you please share how did you eventally solve that issue? by seting a fair large OSD recovery delay start or any other parameter? Wow! Dusting off the cobwebs here. I think this is what lead me to dig into the code and write the WPQ scheduler. I can't remember doing anything specific. I'm sorry I'm not much help in this regard. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com