Hello Brad,
Many thanks of the info :)
ENV:-- Kracken - bluestore - EC 4+1 - 5 node cluster : RHEL7
What is the status of the down+out osd? Only one osd osd.6 down and out from cluster.
What role did/does it play? Mostimportantly, is it osd.6? Yes, due to underlying I/O error issue we removed this device from the cluster.
I put this parameter " osd_find_best_info_ignore_history_les = true" in ceph.conf, and find those 22 PG's were changed to "down+remapped" . Now all are reverted to "remapped+incomplete" state.
#ceph pg stat 2> /dev/null
v2731828: 4096 pgs: 1 incomplete, 21 remapped+incomplete, 4074 active+clean; 268 TB data, 371 TB used, 267 TB / 638 TB avail
## ceph -s
2017-03-30 19:02:14.350242 7f8b0415f700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
2017-03-30 19:02:14.366545 7f8b0415f700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
cluster bd8adcd0-c36d-4367-9efe-f48f5ab5f108
health HEALTH_ERR
22 pgs are stuck inactive for more than 300 seconds
22 pgs incomplete
22 pgs stuck inactive
22 pgs stuck unclean
monmap e2: 5 mons at {au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21.22:6789/0,au-canberra=10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0,au-sydney=10.50.21.20:6789/0}
election epoch 180, quorum 0,1,2,3,4 au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide
mgr active: au-adelaide
osdmap e6506: 117 osds: 117 up, 117 in; 21 remapped pgs
flags sortbitwise,require_jewel_osds,require_kraken_osds
pgmap v2731828: 4096 pgs, 1 pools, 268 TB data, 197 Mobjects
371 TB used, 267 TB / 638 TB avail
4074 active+clean
21 remapped+incomplete
1 incomplete
## ceph osd dump 2>/dev/null | grep cdvr
pool 1 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 456 flags hashpspool,nodeep-scrub stripe_width 65536
Inspecting affected PG 1.e4b
# ceph pg dump 2> /dev/null | grep 1.e4b
1.e4b 50832 0 0 0 0 73013340821 10006 10006 remapped+incomplete 2017-03-30 14:14:26.297098 3844'161662 6506:325748 [113,66,15,73,103] 113 [NONE,NONE,NONE,73,NONE] 73 1643'139486 2017-03-21 04:56:16.683953 0'0 2017-02-21 10:33:50.012922
When I trigger below command.
#ceph pg force_create_pg 1.e4b
pg 1.e4b now creating, ok
As it went to creating state, no change after that. Can you explain why this PG showing null values after triggering "force_create_pg",?
]# ceph pg dump 2> /dev/null | grep 1.e4b
1.e4b 0 0 0 0 0 0 0 0 creating 2017-03-30 19:07:00.982178 0'0 0:0 [] -1 [] -1 0'0 0.000000 0'0 0.000000
Then I triggered below command
# ceph pg repair 1.e4b
Error EAGAIN: pg 1.e4b has no primary osd --<<
Could you please provide answer for below queries.
1. How to fix this "incomplete+remapped" PG issue, here all OSD's were up and running and affected OSD marked out and removed from the cluster.
2. Will reduce min_size helps? currently it set to 4. Could you please explain what is the impact if we reduce min_size for the current config EC 4+1
3. Is there any procedure to safely remove an affected PG? As per my understanding I'm aware about this command.
===
#ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph --pgid 1.e4b --op remove
===
Awaiting for your suggestions to proceed.
Thanks
On Thu, Mar 30, 2017 at 7:32 AM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
On Thu, Mar 30, 2017 at 4:53 AM, nokia ceph <nokiacephusers@xxxxxxxxx> wrote:
> Hello,
>
> Env:-
> 5 node, EC 4+1 bluestore kraken v11.2.0 , RHEL7.2
>
> As part of our resillency testing with kraken bluestore, we face more PG's
> were in incomplete+remapped state. We tried to repair each PG using "ceph pg
> repair <pgid>" still no luck. Then we planned to remove incomplete PG's
> using below procedure.
>
>
> #ceph health detail | grep 1.e4b
> pg 1.e4b is remapped+incomplete, acting [2147483647,66,15,73,2147483647] "Incomplete Ceph detects that a placement group is missing information about
> (reducing pool cdvr_ec min_size from 4 may help; search ceph.com/docs for
> 'incomplete')
writes that may have occurred, or does not have any healthy copies. If you see
this state, try to start any failed OSDs that may contain the needed
information."
>
> Here we shutdown the OSD's 66,15 and 73 then proceeded with below operation.
>
> #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-135 --op list-pgs
> #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-135 --pgid 1.e4b
> --op remove
>
> Please confirm that we are following the correct procedure to removal of
> PG's
There are multiple threads about that on this very list "pgs stuck inactive"
recently for example.
>
> #ceph pg stat
> v2724830: 4096 pgs: 1 active+clean+scrubbing+deep+repair, 1 down+remapped, What is the status of the down+out osd? What role did/does it play? Most
> 21 remapped+incomplete, 4073 active+clean; 268 TB data, 371 TB used, 267 TB
> / 638 TB avail
>
> # ceph -s
> 2017-03-29 18:23:44.288508 7f8c2b8e5700 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> 2017-03-29 18:23:44.304692 7f8c2b8e5700 -1 WARNING: the following dangerous
> and experimental features are enabled: bluestore,rocksdb
> cluster bd8adcd0-c36d-4367-9efe-f48f5ab5f108
> health HEALTH_ERR
> 22 pgs are stuck inactive for more than 300 seconds
> 1 pgs down
> 21 pgs incomplete
> 1 pgs repair
> 22 pgs stuck inactive
> 22 pgs stuck unclean
> monmap e2: 5 mons at
> {au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21.22: }6789/0,au-canberra=10.50.21. 23:6789/0,au-melbourne=10.50. 21.21:6789/0,au-sydney=10.50. 21.20:6789/0
> election epoch 172, quorum 0,1,2,3,4
> au-sydney,au-melbourne,au-brisbane,au-canberra,au- adelaide
> mgr active: au-brisbane
> osdmap e6284: 118 osds: 117 up, 117 in; 22 remapped pgs
importantly, is it osd.6?
> flags sortbitwise,require_jewel_osds,require_kraken_osds See the thread previously mentioned. Take note of the force_create_pg step.
> pgmap v2724830: 4096 pgs, 1 pools, 268 TB data, 197 Mobjects
> 371 TB used, 267 TB / 638 TB avail
> 4073 active+clean
> 21 remapped+incomplete
> 1 down+remapped
> 1 active+clean+scrubbing+deep+repair
>
>
> #ceph osd dump | grep pool
> pool 1 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash
> rjenkins pg_num 4096 pgp_num 4096 last_change 456 flags
> hashpspool,nodeep-scrub stripe_width 65536
>
>
>
> Can you please suggest is there any way to wipe out these incomplete PG's.
> Why ceph pg repair failed in this scenerio?
> How to recover incomplete PG's to active state.
>
> pg query for the affected PG ended with this error. Can you please explain
> what is meant by this ?
> ---
> "15(2)",
> "66(1)",
> "73(3)",
> "103(4)",
> "113(0)"
> ],
> "down_osds_we_would_probe": [
> 6
> ],
> "peering_blocked_by": [],
> "peering_blocked_by_detail": [
> {
> "detail": "peering_blocked_by_history_les_bound" During multiple intervals osd 6 was in the up/acting set, for example;
> }
> ----
{
"first": 1608,
"last": 1645,
"maybe_went_rw": 1,
"up": [
113,
6,
15,
73,
103
],
"acting": [
113,
6,
15,
73,
103
],
"primary": 113,
"up_primary": 113
},
Because we may have gone rw during that interval we need to query it and it is blocking progress.
"blocked_by": [
6
],
Setting osd_find_best_info_ignore_history_les to true may help but then you may
need to mark the missing OSD lost or perform some other trickery (and this . I
suspect your min_size is too low, especially for a cluster of this size, but EC
is not an area I know extensively so I can't say definitively. Some of your
questions may be better suited to the ceph-devel mailing list by the way.
>
> Attaching "ceph pg 1.e4b query > /tmp/1.e4b-pg.txt" file with this mail.
>
> Thanks
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
>
--
Cheers,
Brad
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com