Can you post the output ot ceph daemon osd.xx config show? (probably as an attachment). There are several things that I've seen cause it 1) too many PGs but too little degraded objects make it seem "slow" (if you just have 2 degraded objects but restarted a host with 10K PGs, it will have to scan all the PGs probably) 2) sometimes the process gets stuck when a toofull condition occurs 3) sometimes the process gets stuck for no apparent reason - restarting the currently backfilling/recovering OSDs fixes it setting osd_recovery_threads sometimes fixes both 2) and 3), but usually not 4) setting recovery_delay_start to anything > 0 makes recovery slow (even 0.0000001 makes it much slower than simple 0). On the other hand we had to set it high as a default because of slow ops when restarting OSDs, which was partially fixed by this. Can you see any bottleneck in the system? CPU spinning, disks reading? I don't think this is the issue, just make sure it's not something more obvious... Jan
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com