This is likely the problem. I'll try to reproduce it today. (Meant to post this to the list the first time) -Sam On Thu, Dec 15, 2011 at 9:24 AM, Stratos Psomadakis <psomas@xxxxxxxx> wrote: > On 12/15/2011 06:44 PM, Guido Winkelmann wrote: >> Am Donnerstag, 15. Dezember 2011, 08:30:26 schrieben Sie: >>> 'ceph pg dump' will tell you the status (active/clean/scrubbing/etc) >>> for each pg. Does the same pg remain in state active+clean+scrubbing >>> for more than 10 minutes? >> Well, I used ceph -s, which only gave me a summary, but there definitely was a >> PG that was in active+clean+scrubbing for a long time (a lot longer than 10 >> minutes), and remained so until I restarted one of the osds. >> >> Unfortunately I don't know how to reliably reproduce the problem, so I can't >> check now... > When I hit that bug, I was able to trigger it (more easily) by setting: > osd scrub max interval = 120 > in the [osd] section in ceph.conf, forcing the cluster to send pg scrubs > more often. > > Now, if you stress the cluster a bit (some heavy I/O), coupled with > singe OSD restarts, I think you could be able to trigger it. > > Btw, I was using the rbd in-kernel driver. > > Some info from the debugging I did, I think that at some point after > setting finalizing_scrub = true, it turns out that (last_update_applied > != info.last_update), but the scrub operation is never requeued by > op_applied for some reason, and so the PG is stuck as scrubbing. > >> Guido >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- > Stratos Psomadakis > <psomas@xxxxxxxx> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html