Hi, In the recent versions Ceph introduces some unexpected behavior for the permanent connections (VM or kernel clients) - after crash recovery, I/O will hang on the next planned scrub on the following scenario: - launch a bunch of clients doing non-intensive writes, - lose one or more osd, mark them down, wait for recovery completion, - do a slow scrub, e.g. scrubbing one osd per 5m, inside bash script, or wait for ceph to do the same, - observe a raising number of pgs stuck in the active+clean+scrubbing state (they took a master role from ones which was on killed osd and almost surely they are being written in time of crash), - some time later, clients will hang hardly and ceph log introduce stuck(old) I/O requests. The only one way to return clients back without losing their I/O state is per-osd restart, which also will help to get rid of active+clean+scrubbing pgs. First of all, I`ll be happy to help to solve this problem by providing logs. Second question is not directly related to this problem, but I have thought on for a long time - is there a planned features to control scrub process more precisely, e.g. pg scrub rate or scheduled scrub, instead of current set of timeouts which of course not very predictable on when to run? Thanks! -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html