Re: pg incomplete state

John-Paul Robinson <jpr@xxxxxxx> · Wed, 21 Oct 2015 14:55:01 -0500

Greg,

Thanks for the insight.  I suspect things are somewhat sane given that I
did erase the primary (osd.30) and the secondary (osd.11) still contains
pg data.

If I may, could you clarify the process of backfill a little?

I understand the min_size allows I/O on the object to resume while there
are only that many replicas (ie. 1 once changed) and this would let
things move forward.

I would expect, however, that some backfill would already be on-going
for pg 3.ea on osd.30.  As far as I can tell, there isn't anything
happening.  The pg 3.ea directory is just as empty today as it was
yesterday.

Will changing the min_size actually trigger backfill to begin for an
object if has stalled or never got started?

An alternative idea I had was to take osd.30 back out of the cluster so
that pg 3.ae [30,11] would get mapped to some other osd to maintain
replication.  This seems a bit heavy handed though, given that only this
one pg is affected.

Thanks for any follow up.

~jpr 

On 10/21/2015 01:21 PM, Gregory Farnum wrote:
> On Tue, Oct 20, 2015 at 7:22 AM, John-Paul Robinson <jpr@xxxxxxx> wrote:
>> Hi folks
>>
>> I've been rebuilding drives in my cluster to add space.  This has gone
>> well so far.
>>
>> After the last batch of rebuilds, I'm left with one placement group in
>> an incomplete state.
>>
>> [sudo] password for jpr:
>> HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck unclean
>> pg 3.ea is stuck inactive since forever, current state incomplete, last
>> acting [30,11]
>> pg 3.ea is stuck unclean since forever, current state incomplete, last
>> acting [30,11]
>> pg 3.ea is incomplete, acting [30,11]
>>
>> I've restarted both OSD a few times but it hasn't cleared the error.
>>
>> On the primary I see errors in the log related to slow requests:
>>
>> 2015-10-20 08:40:36.678569 7f361585c700  0 log [WRN] : 8 slow requests,
>> 3 included below; oldest blocked for > 31.922487 secs
>> 2015-10-20 08:40:36.678580 7f361585c700  0 log [WRN] : slow request
>> 31.531606 seconds old, received at 2015-10-20 08:40:05.146902:
>> osd_op(client.158903.1:343217143 rb.0.25cf8.238e1f29.00000000a044 [read
>> 1064960~262144] 3.ae9968ea RETRY) v4 currently reached pg
>> 2015-10-20 08:40:36.678592 7f361585c700  0 log [WRN] : slow request
>> 31.531591 seconds old, received at 2015-10-20 08:40:05.146917:
>> osd_op(client.158903.1:343217144 rb.0.25cf8.238e1f29.00000000a044 [read
>> 2113536~262144] 3.ae9968ea RETRY) v4 currently reached pg
>> 2015-10-20 08:40:36.678599 7f361585c700  0 log [WRN] : slow request
>> 31.531551 seconds old, received at 2015-10-20 08:40:05.146957:
>> osd_op(client.158903.1:343232634 ekessler-default.rbd [watch 35~0]
>> 3.e4bd50ea) v4 currently reached pg
>>
>> Note's online suggest this is an issue with the journal and that it may
>> be possible to export and rebuild thepg.  I don't have firefly.
>>
>> https://ceph.com/community/incomplete-pgs-oh-my/
>>
>> Interestingly, pg 3.ea appears to be complete on osd.11 (the secondary)
>> but missing entirely on osd.30 (the primary).
>>
>> on osd.33 (primary):
>>
>> crowbar@da0-36-9f-0e-2b-88:~$ du -sk
>> /var/lib/ceph/osd/ceph-30/current/3.ea_head/
>> 0       /var/lib/ceph/osd/ceph-30/current/3.ea_head/
>>
>> on osd.11 (secondary):
>>
>> crowbar@da0-36-9f-0e-2b-40:~$ du -sh
>> /var/lib/ceph/osd/ceph-11/current/3.ea_head/
>> 63G     /var/lib/ceph/osd/ceph-11/current/3.ea_head/
>>
>> This makes some sense since, my disk drive rebuilding activity
>> reformatted the primary osd.30.  It also gives me some hope that my data
>> is not lost.
>>
>> I understand incomplete means problem with journal, but is there a way
>> to dig deeper into this or possible to get the secondary's data to take
>> over?
> If you're running an older version of Ceph (Firefly or earlier,
> maybe?), "incomplete" can also mean "not enough replicas". It looks
> like that's what you're hitting here, if osd.11 is not reporting any
> issues. If so, simply setting the min_size on this pool to 1 until the
> backfilling is done should let you get going.
> -Greg

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com