Re: PG stuck in scrubbing

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 25 Oct 2011 09:11:56 -0700 (PDT)

On Tue, 25 Oct 2011, Christian Brunner wrote:
> Here is another problem I've seen. Unfortunatly I do not have any
> debug output and it's not reproduceable.
> 
> While removing an image with "rbd rm" I noticed that rbd stopped
> making progress. When I looked with "ceph -w" I saw a PG, that was in
> state "active+clean+scrubbing":
> 
> 2011-10-25 14:01:34.198961    pg v1215175: 2776 pgs: 2775
> active+clean, 1 active+clean+scrubbing; 3000 GB data, 1658 GB used,
> 56737 GB / 59615 GB avail

I hit this a few days ago and wasn't able to pinpoint the cause, but 
did simplify the surrounding code and haven't hit it since.  There have 
also been several fixes surrounding the op requeuing.  We haven't run the 
full battery of tests yet, but in another day or two master should behave 
much better.

FWIW, the scrub patch was dd5087fabb2a743741a96ee4610379afa8431f68.

sage

> 
> This state didn't change for half an hour, so I decided to look which
> OSD is involved
> 
> pg_stat objects mip     degr    unf     kb      bytes   log
> disklog state   v       reported        up      acting  last_scrub
> 2.163   773     0       0       0       3162113 3238002800      1917
>  1917    active+clean+scrubbing  2626'7909       2597'10775
> [11,7]       [11,7]  2613'7708       2011-10-24 11:54:39.890963
> 
> and I restared OSD 11. After that everything went back to normal - the
> "rbd rm" finished and scrubbing continued on other PGs. Doing a manual
> scrub on 2.163 is fine, too.
> 
> Regards,
> Christian
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html