Re: pg incomplete second osd in acting set still available

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So one more update. 

I suspect I may need to do more than force the secondary osd to become
the primary due to the reported sate of the pg.  The reported pg state
reflects that it contains what it thinks is the correct but inaccurate
pg state.

In the dump for one of the pg's below the version timestamp is 0'0 and
it reportedly contains 0 bytes. This is obviously not what I want.  I
suspect failing over to the secondary won't magically fix this state.

        { "pgid": "3.5",
          "version": "0'0",
          "reported": "14700'72",
          "state": "incomplete",
          "last_fresh": "2016-03-25 14:19:14.124358",
          "last_change": "2016-03-25 11:40:51.430747",
          "last_active": "0.000000",
          "last_clean": "0.000000",
          "last_unstale": "2016-03-25 14:19:14.124358",
          "mapping_epoch": 14699,
          "log_start": "0'0",
          "ondisk_log_start": "0'0",
          "created": 237,
          "last_epoch_clean": 237,
          "parent": "0.0",
          "parent_split_bits": 0,
          "last_scrub": "13159'1673981",
          "last_scrub_stamp": "2016-03-24 06:38:07.016619",
          "last_deep_scrub": "12968'1671815",
          "last_deep_scrub_stamp": "2016-03-17 13:49:16.150190",
          "last_clean_scrub_stamp": "2016-03-24 06:38:07.016619",
          "log_size": 0,
          "ondisk_log_size": 0,
          "stats_invalid": "0",
          "stat_sum": { "num_bytes": 0,
              "num_objects": 0,
              "num_object_clones": 0,
              "num_object_copies": 0,
              "num_objects_missing_on_primary": 0,
              "num_objects_degraded": 0,
              "num_objects_unfound": 0,
              "num_read": 0,
              "num_read_kb": 0,
              "num_write": 0,
              "num_write_kb": 0,
              "num_scrub_errors": 0,
              "num_objects_recovered": 0,
              "num_bytes_recovered": 0,
              "num_keys_recovered": 0},
          "stat_cat_sum": {},
          "up": [
                53,
                22],
          "acting": [
                53,
                22]},

I'm a little confused by the description of peering.  I would have
imagined that taking an osd out of the cluster would be like saying it
doesn't exist anymore so there wouldn't be an expectation of continuity
of state.

http://docs.ceph.com/docs/master/dev/peering/#description-of-the-peering-process

Thanks again for any pointers,

~Jpr


On 03/25/2016 05:30 PM, John-Paul Robinson wrote:
> So I think I know what might have gone wrong.
>
> When I took might osd's out of the cluster and shut them down, the first
> set of osds likely came back up and in the cluster before 300 seconds
> expired.  This would have prevented cluster triggering recovery of the
> pg from the replica osd.
>
> So the question is, can I force this to happen?  Can I take the supposed
> primary osd down for 300+ seconds to allow the cluster to start
> recovering the pgs (this will of course affect all other pgs on the
> osds).  Or is there a better way?
>
> Note that all my secondary osds in these pgs have the expected amount of
> data for the pg, remained up during the primary's downtime and should
> have the state to become the primary for the acting set.
>
> Thanks for listening.
>
> ~jpr
>
>
> On 03/25/2016 11:57 AM, John-Paul Robinson wrote:
>> Hi Folks,
>>
>> One last dip into my old bobtail cluster.  (new hardware is on order)
>>
>> I have three pg in an incomplete state.  The cluster was previously
>> stable but with a health warn state due to a few near full osds.  I
>> started resizing drives on one host to expand space after taking the
>> osds that served them out and down.  My failure domain is two levels
>> osds and hosts and have two copies per placement group.
>>
>> I have three of my pgs flagging incomplete.
>>
>> root@d90-b1-1c-3a-c4-8f:~# date; sudo ceph --id nova health detail |
>> grep incomplete
>> Fri Mar 25 11:28:47 CDT 2016
>> HEALTH_WARN 168 pgs backfill; 107 pgs backfilling; 241 pgs degraded; 3
>> pgs incomplete; 3 pgs stuck inactive; 287 pgs stuck unclean; recovery
>> 4913393/39589336 degraded (12.411%);  recovering 120 o/s, 481MB/s; 4
>> near full osd(s)
>> pg 3.5 is stuck inactive since forever, current state incomplete, last
>> acting [53,22]
>> pg 3.150 is stuck inactive since forever, current state incomplete, last
>> acting [50,74]
>> pg 3.38c is stuck inactive since forever, current state incomplete, last
>> acting [14,70]
>> pg 3.5 is stuck unclean since forever, current state incomplete, last
>> acting [53,22]
>> pg 3.150 is stuck unclean since forever, current state incomplete, last
>> acting [50,74]
>> pg 3.38c is stuck unclean since forever, current state incomplete, last
>> acting [14,70]
>> pg 3.38c is incomplete, acting [14,70]
>> pg 3.150 is incomplete, acting [50,74]
>> pg 3.5 is incomplete, acting [53,22]
>>
>> Given that incomplete means:
>>
>> "Ceph detects that a placement group is missing information about writes
>> that may have occurred, or does not have any healthy copies. If you see
>> this state, try to start any failed OSDs that may contain the needed
>> information or temporarily adjust min_size to allow recovery."
>>
>> I have restarted all osds in these acting sets and they log normally,
>> opening their respective journals and such. However, the incomplete
>> state remains.
>>
>> All three of the primary osds 53,50,14 have were reformatted to expand
>> size, so I know there's no "spare" journal if its referring to what was
>> there before.  Btw, I did take all osds to out and down before resizing
>> their drives, so I'm not sure how these pg would actually be expecting
>> old journal.
>>
>> I suspect I need to forgo the journal and let the secondaries become
>> primary for these pg.
>>
>> I sure hope that's possible.
>>
>> As always, thanks for any pointers.
>>
>> ~jpr
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux