Re: Troubleshooting incomplete PG's

Brad Hubbard <bhubbard@xxxxxxxxxx> · Sun, 2 Apr 2017 10:11:14 +1000

On Fri, Mar 31, 2017 at 5:19 AM, nokia ceph <nokiacephusers@xxxxxxxxx> wrote:
> Hello Brad,
>
> Many thanks of the info :)
>
> ENV:-- Kracken - bluestore - EC 4+1 - 5 node cluster : RHEL7
>
> What is the status of the down+out osd? Only one osd osd.6 down and out from
> cluster.
> What role did/does it play? Mostimportantly, is it osd.6? Yes, due to
> underlying I/O error issue we removed this device from the cluster.
>
> I put this parameter " osd_find_best_info_ignore_history_les = true" in
> ceph.conf, and find those 22 PG's were changed to "down+remapped" . Now all
> are reverted to "remapped+incomplete" state.
>
> #ceph pg stat 2> /dev/null
> v2731828: 4096 pgs: 1 incomplete, 21 remapped+incomplete, 4074 active+clean;
> 268 TB data, 371 TB used, 267 TB / 638 TB avail
>
> ## ceph -s
> 2017-03-30 19:02:14.350242 7f8b0415f700 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2017-03-30 19:02:14.366545 7f8b0415f700 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
>     cluster bd8adcd0-c36d-4367-9efe-f48f5ab5f108
>      health HEALTH_ERR
>             22 pgs are stuck inactive for more than 300 seconds
>             22 pgs incomplete
>             22 pgs stuck inactive
>             22 pgs stuck unclean
>      monmap e2: 5 mons at
> {au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21.22:6789/0,au-canberra=10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0,au-sydney=10.50.21.20:6789/0}
>             election epoch 180, quorum 0,1,2,3,4

Are you *actually* trying to create a cluster that is as
geographically dispersed as these machine names indicate?

> au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide
>         mgr active: au-adelaide
>      osdmap e6506: 117 osds: 117 up, 117 in; 21 remapped pgs
>             flags sortbitwise,require_jewel_osds,require_kraken_osds
>       pgmap v2731828: 4096 pgs, 1 pools, 268 TB data, 197 Mobjects
>             371 TB used, 267 TB / 638 TB avail
>                 4074 active+clean
>                   21 remapped+incomplete
>                    1 incomplete
>
>
> ## ceph osd dump 2>/dev/null | grep cdvr
> pool 1 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash
> rjenkins pg_num 4096 pgp_num 4096 last_change 456 flags
> hashpspool,nodeep-scrub stripe_width 65536
>
> Inspecting affected PG 1.e4b
>
> # ceph pg dump 2> /dev/null | grep 1.e4b
> 1.e4b     50832                  0        0         0       0 73013340821
> 10006    10006 remapped+incomplete 2017-03-30 14:14:26.297098 3844'161662
> 6506:325748 [113,66,15,73,103]        113  [NONE,NONE,NONE,73,NONE]
> 73 1643'139486 2017-03-21 04:56:16.683953             0'0 2017-02-21
> 10:33:50.012922
>
> When I trigger below command.
>
> #ceph pg force_create_pg 1.e4b
> pg 1.e4b now creating, ok
>
> As it went to creating state, no change after that. Can you explain why this
> PG showing null values after triggering "force_create_pg",?
>
> ]# ceph pg dump 2> /dev/null | grep 1.e4b
> 1.e4b         0                  0        0         0       0           0
> 0        0            creating 2017-03-30 19:07:00.982178         0'0
> 0:0                 []         -1                        []             -1
> 0'0                   0.000000             0'0                   0.000000
>
> Then I triggered below command
>
> # ceph pg  repair 1.e4b
> Error EAGAIN: pg 1.e4b has no primary osd  --<<
>
> Could you please provide answer for below queries.
>
> 1. How to fix this "incomplete+remapped" PG issue, here all OSD's were up
> and running and affected OSD marked out and removed from the cluster.
> 2. Will reduce min_size helps? currently it set to 4. Could you please
> explain what is the impact if we reduce min_size for the current config EC
> 4+1
> 3. Is there any procedure to safely remove an affected PG? As per my
> understanding I'm aware about this command.
>
> ===
> #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph --pgid 1.e4b --op
> remove
> ===
>
> Awaiting for your suggestions to proceed.
>
> Thanks
>
>
>
>
>
>
> On Thu, Mar 30, 2017 at 7:32 AM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
>>
>>
>>
>> On Thu, Mar 30, 2017 at 4:53 AM, nokia ceph <nokiacephusers@xxxxxxxxx>
>> wrote:
>> > Hello,
>> >
>> > Env:-
>> > 5 node, EC 4+1 bluestore kraken v11.2.0 , RHEL7.2
>> >
>> > As part of our resillency testing with kraken bluestore, we face more
>> > PG's
>> > were in incomplete+remapped state. We tried to repair each PG using
>> > "ceph pg
>> > repair <pgid>" still no luck. Then we planned to remove incomplete PG's
>> > using below procedure.
>> >
>> >
>> > #ceph health detail | grep  1.e4b
>> > pg 1.e4b is remapped+incomplete, acting [2147483647,66,15,73,2147483647]
>> > (reducing pool cdvr_ec min_size from 4 may help; search ceph.com/docs
>> > for
>> > 'incomplete')
>>
>> "Incomplete Ceph detects that a placement group is missing information
>> about
>> writes that may have occurred, or does not have any healthy copies. If you
>> see
>> this state, try to start any failed OSDs that may contain the needed
>> information."
>>
>> >
>> > Here we shutdown the OSD's 66,15 and 73 then proceeded with below
>> > operation.
>> >
>> > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-135 --op
>> > list-pgs
>> > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-135 --pgid
>> > 1.e4b
>> > --op remove
>> >
>> > Please confirm that we are following the correct procedure to removal of
>> > PG's
>>
>> There are multiple threads about that on this very list "pgs stuck
>> inactive"
>> recently for example.
>>
>> >
>> > #ceph pg stat
>> > v2724830: 4096 pgs: 1 active+clean+scrubbing+deep+repair, 1
>> > down+remapped,
>> > 21 remapped+incomplete, 4073 active+clean; 268 TB data, 371 TB used, 267
>> > TB
>> > / 638 TB avail
>> >
>> > # ceph -s
>> > 2017-03-29 18:23:44.288508 7f8c2b8e5700 -1 WARNING: the following
>> > dangerous
>> > and experimental features are enabled: bluestore,rocksdb
>> > 2017-03-29 18:23:44.304692 7f8c2b8e5700 -1 WARNING: the following
>> > dangerous
>> > and experimental features are enabled: bluestore,rocksdb
>> >     cluster bd8adcd0-c36d-4367-9efe-f48f5ab5f108
>> >      health HEALTH_ERR
>> >             22 pgs are stuck inactive for more than 300 seconds
>> >             1 pgs down
>> >             21 pgs incomplete
>> >             1 pgs repair
>> >             22 pgs stuck inactive
>> >             22 pgs stuck unclean
>> >      monmap e2: 5 mons at
>> >
>> > {au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21.22:6789/0,au-canberra=10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0,au-sydney=10.50.21.20:6789/0}
>> >             election epoch 172, quorum 0,1,2,3,4
>> > au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide
>> >         mgr active: au-brisbane
>> >      osdmap e6284: 118 osds: 117 up, 117 in; 22 remapped pgs
>>
>> What is the status of the down+out osd? What role did/does it play? Most
>> importantly, is it osd.6?
>>
>> >             flags sortbitwise,require_jewel_osds,require_kraken_osds
>> >       pgmap v2724830: 4096 pgs, 1 pools, 268 TB data, 197 Mobjects
>> >             371 TB used, 267 TB / 638 TB avail
>> >                 4073 active+clean
>> >                   21 remapped+incomplete
>> >                    1 down+remapped
>> >                    1 active+clean+scrubbing+deep+repair
>> >
>> >
>> > #ceph osd dump | grep pool
>> > pool 1 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash
>> > rjenkins pg_num 4096 pgp_num 4096 last_change 456 flags
>> > hashpspool,nodeep-scrub stripe_width 65536
>> >
>> >
>> >
>> > Can you please suggest is there any way to wipe out these incomplete
>> > PG's.
>>
>> See the thread previously mentioned. Take note of the force_create_pg
>> step.
>>
>> > Why ceph pg repair failed in this scenerio?
>> > How to recover incomplete PG's to active state.
>> >
>> > pg query for the affected PG ended with this error. Can you please
>> > explain
>> > what is meant by this ?
>> > ---
>> >                 "15(2)",
>> >                 "66(1)",
>> >                 "73(3)",
>> >                 "103(4)",
>> >                 "113(0)"
>> >             ],
>> >             "down_osds_we_would_probe": [
>> >                 6
>> >             ],
>> >             "peering_blocked_by": [],
>> >             "peering_blocked_by_detail": [
>> >                 {
>> >                     "detail": "peering_blocked_by_history_les_bound"
>> >                 }
>> > ----
>>
>> During multiple intervals osd 6 was in the up/acting set, for example;
>>
>>                 {
>>                     "first": 1608,
>>                     "last": 1645,
>>                     "maybe_went_rw": 1,
>>                     "up": [
>>                         113,
>>                         6,
>>                         15,
>>                         73,
>>                         103
>>                     ],
>>                     "acting": [
>>                         113,
>>                         6,
>>                         15,
>>                         73,
>>                         103
>>                     ],
>>                     "primary": 113,
>>                     "up_primary": 113
>>                 },
>>
>> Because we may have gone rw during that interval we need to query it and
>> it is blocking progress.
>>
>>             "blocked_by": [
>>                 6
>>             ],
>>
>> Setting osd_find_best_info_ignore_history_les to true may help but then
>> you may
>> need to mark the missing OSD lost or perform some other trickery (and this
>> . I
>> suspect your min_size is too low, especially for a cluster of this size,
>> but EC
>> is not an area I know extensively so I can't say definitively. Some of
>> your
>> questions may be better suited to the ceph-devel mailing list by the way.
>>
>> >
>> > Attaching "ceph pg 1.e4b query > /tmp/1.e4b-pg.txt" file with this mail.
>> >
>> > Thanks
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Cheers,
>> Brad
>
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com