On Fri, Mar 31, 2017 at 5:19 AM, nokia ceph <nokiacephusers@xxxxxxxxx> wrote: > Hello Brad, > > Many thanks of the info :) > > ENV:-- Kracken - bluestore - EC 4+1 - 5 node cluster : RHEL7 > > What is the status of the down+out osd? Only one osd osd.6 down and out from > cluster. > What role did/does it play? Mostimportantly, is it osd.6? Yes, due to > underlying I/O error issue we removed this device from the cluster. > > I put this parameter " osd_find_best_info_ignore_history_les = true" in > ceph.conf, and find those 22 PG's were changed to "down+remapped" . Now all > are reverted to "remapped+incomplete" state. > > #ceph pg stat 2> /dev/null > v2731828: 4096 pgs: 1 incomplete, 21 remapped+incomplete, 4074 active+clean; > 268 TB data, 371 TB used, 267 TB / 638 TB avail > > ## ceph -s > 2017-03-30 19:02:14.350242 7f8b0415f700 -1 WARNING: the following dangerous > and experimental features are enabled: bluestore,rocksdb > 2017-03-30 19:02:14.366545 7f8b0415f700 -1 WARNING: the following dangerous > and experimental features are enabled: bluestore,rocksdb > cluster bd8adcd0-c36d-4367-9efe-f48f5ab5f108 > health HEALTH_ERR > 22 pgs are stuck inactive for more than 300 seconds > 22 pgs incomplete > 22 pgs stuck inactive > 22 pgs stuck unclean > monmap e2: 5 mons at > {au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21.22:6789/0,au-canberra=10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0,au-sydney=10.50.21.20:6789/0} > election epoch 180, quorum 0,1,2,3,4 Are you *actually* trying to create a cluster that is as geographically dispersed as these machine names indicate? > au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide > mgr active: au-adelaide > osdmap e6506: 117 osds: 117 up, 117 in; 21 remapped pgs > flags sortbitwise,require_jewel_osds,require_kraken_osds > pgmap v2731828: 4096 pgs, 1 pools, 268 TB data, 197 Mobjects > 371 TB used, 267 TB / 638 TB avail > 4074 active+clean > 21 remapped+incomplete > 1 incomplete > > > ## ceph osd dump 2>/dev/null | grep cdvr > pool 1 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash > rjenkins pg_num 4096 pgp_num 4096 last_change 456 flags > hashpspool,nodeep-scrub stripe_width 65536 > > Inspecting affected PG 1.e4b > > # ceph pg dump 2> /dev/null | grep 1.e4b > 1.e4b 50832 0 0 0 0 73013340821 > 10006 10006 remapped+incomplete 2017-03-30 14:14:26.297098 3844'161662 > 6506:325748 [113,66,15,73,103] 113 [NONE,NONE,NONE,73,NONE] > 73 1643'139486 2017-03-21 04:56:16.683953 0'0 2017-02-21 > 10:33:50.012922 > > When I trigger below command. > > #ceph pg force_create_pg 1.e4b > pg 1.e4b now creating, ok > > As it went to creating state, no change after that. Can you explain why this > PG showing null values after triggering "force_create_pg",? > > ]# ceph pg dump 2> /dev/null | grep 1.e4b > 1.e4b 0 0 0 0 0 0 > 0 0 creating 2017-03-30 19:07:00.982178 0'0 > 0:0 [] -1 [] -1 > 0'0 0.000000 0'0 0.000000 > > Then I triggered below command > > # ceph pg repair 1.e4b > Error EAGAIN: pg 1.e4b has no primary osd --<< > > Could you please provide answer for below queries. > > 1. How to fix this "incomplete+remapped" PG issue, here all OSD's were up > and running and affected OSD marked out and removed from the cluster. > 2. Will reduce min_size helps? currently it set to 4. Could you please > explain what is the impact if we reduce min_size for the current config EC > 4+1 > 3. Is there any procedure to safely remove an affected PG? As per my > understanding I'm aware about this command. > > === > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph --pgid 1.e4b --op > remove > === > > Awaiting for your suggestions to proceed. > > Thanks > > > > > > > On Thu, Mar 30, 2017 at 7:32 AM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: >> >> >> >> On Thu, Mar 30, 2017 at 4:53 AM, nokia ceph <nokiacephusers@xxxxxxxxx> >> wrote: >> > Hello, >> > >> > Env:- >> > 5 node, EC 4+1 bluestore kraken v11.2.0 , RHEL7.2 >> > >> > As part of our resillency testing with kraken bluestore, we face more >> > PG's >> > were in incomplete+remapped state. We tried to repair each PG using >> > "ceph pg >> > repair <pgid>" still no luck. Then we planned to remove incomplete PG's >> > using below procedure. >> > >> > >> > #ceph health detail | grep 1.e4b >> > pg 1.e4b is remapped+incomplete, acting [2147483647,66,15,73,2147483647] >> > (reducing pool cdvr_ec min_size from 4 may help; search ceph.com/docs >> > for >> > 'incomplete') >> >> "Incomplete Ceph detects that a placement group is missing information >> about >> writes that may have occurred, or does not have any healthy copies. If you >> see >> this state, try to start any failed OSDs that may contain the needed >> information." >> >> > >> > Here we shutdown the OSD's 66,15 and 73 then proceeded with below >> > operation. >> > >> > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-135 --op >> > list-pgs >> > #ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-135 --pgid >> > 1.e4b >> > --op remove >> > >> > Please confirm that we are following the correct procedure to removal of >> > PG's >> >> There are multiple threads about that on this very list "pgs stuck >> inactive" >> recently for example. >> >> > >> > #ceph pg stat >> > v2724830: 4096 pgs: 1 active+clean+scrubbing+deep+repair, 1 >> > down+remapped, >> > 21 remapped+incomplete, 4073 active+clean; 268 TB data, 371 TB used, 267 >> > TB >> > / 638 TB avail >> > >> > # ceph -s >> > 2017-03-29 18:23:44.288508 7f8c2b8e5700 -1 WARNING: the following >> > dangerous >> > and experimental features are enabled: bluestore,rocksdb >> > 2017-03-29 18:23:44.304692 7f8c2b8e5700 -1 WARNING: the following >> > dangerous >> > and experimental features are enabled: bluestore,rocksdb >> > cluster bd8adcd0-c36d-4367-9efe-f48f5ab5f108 >> > health HEALTH_ERR >> > 22 pgs are stuck inactive for more than 300 seconds >> > 1 pgs down >> > 21 pgs incomplete >> > 1 pgs repair >> > 22 pgs stuck inactive >> > 22 pgs stuck unclean >> > monmap e2: 5 mons at >> > >> > {au-adelaide=10.50.21.24:6789/0,au-brisbane=10.50.21.22:6789/0,au-canberra=10.50.21.23:6789/0,au-melbourne=10.50.21.21:6789/0,au-sydney=10.50.21.20:6789/0} >> > election epoch 172, quorum 0,1,2,3,4 >> > au-sydney,au-melbourne,au-brisbane,au-canberra,au-adelaide >> > mgr active: au-brisbane >> > osdmap e6284: 118 osds: 117 up, 117 in; 22 remapped pgs >> >> What is the status of the down+out osd? What role did/does it play? Most >> importantly, is it osd.6? >> >> > flags sortbitwise,require_jewel_osds,require_kraken_osds >> > pgmap v2724830: 4096 pgs, 1 pools, 268 TB data, 197 Mobjects >> > 371 TB used, 267 TB / 638 TB avail >> > 4073 active+clean >> > 21 remapped+incomplete >> > 1 down+remapped >> > 1 active+clean+scrubbing+deep+repair >> > >> > >> > #ceph osd dump | grep pool >> > pool 1 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash >> > rjenkins pg_num 4096 pgp_num 4096 last_change 456 flags >> > hashpspool,nodeep-scrub stripe_width 65536 >> > >> > >> > >> > Can you please suggest is there any way to wipe out these incomplete >> > PG's. >> >> See the thread previously mentioned. Take note of the force_create_pg >> step. >> >> > Why ceph pg repair failed in this scenerio? >> > How to recover incomplete PG's to active state. >> > >> > pg query for the affected PG ended with this error. Can you please >> > explain >> > what is meant by this ? >> > --- >> > "15(2)", >> > "66(1)", >> > "73(3)", >> > "103(4)", >> > "113(0)" >> > ], >> > "down_osds_we_would_probe": [ >> > 6 >> > ], >> > "peering_blocked_by": [], >> > "peering_blocked_by_detail": [ >> > { >> > "detail": "peering_blocked_by_history_les_bound" >> > } >> > ---- >> >> During multiple intervals osd 6 was in the up/acting set, for example; >> >> { >> "first": 1608, >> "last": 1645, >> "maybe_went_rw": 1, >> "up": [ >> 113, >> 6, >> 15, >> 73, >> 103 >> ], >> "acting": [ >> 113, >> 6, >> 15, >> 73, >> 103 >> ], >> "primary": 113, >> "up_primary": 113 >> }, >> >> Because we may have gone rw during that interval we need to query it and >> it is blocking progress. >> >> "blocked_by": [ >> 6 >> ], >> >> Setting osd_find_best_info_ignore_history_les to true may help but then >> you may >> need to mark the missing OSD lost or perform some other trickery (and this >> . I >> suspect your min_size is too low, especially for a cluster of this size, >> but EC >> is not an area I know extensively so I can't say definitively. Some of >> your >> questions may be better suited to the ceph-devel mailing list by the way. >> >> > >> > Attaching "ceph pg 1.e4b query > /tmp/1.e4b-pg.txt" file with this mail. >> > >> > Thanks >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> >> >> >> -- >> Cheers, >> Brad > > -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com