Re: pg stuck in peering while power failure

Samuel Just <sjust@xxxxxxxxxx> · Tue, 10 Jan 2017 08:08:38 -0800

        {
            "name": "Started\/Primary\/Peering",
            "enter_time": "2017-01-10 13:43:34.933074",
            "past_intervals": [
                {
                    "first": 75858,
                    "last": 75860,
                    "maybe_went_rw": 1,
                    "up": [
                        345,
                        622,
                        685,
                        183,
                        792,
                        2147483647,
                        2147483647,
                        401,
                        516
                    ],
                    "acting": [
                        345,
                        622,
                        685,
                        183,
                        792,
                        2147483647,
                        2147483647,
                        401,
                        516
                    ],
                    "primary": 345,
                    "up_primary": 345
                },

Between 75858 and 75860,

                        345,
                        622,
                        685,
                        183,
                        792,
                        2147483647,
                        2147483647,
                        401,
                        516

was the acting set.  The current acting set

                    345,
                    622,
                    685,
                    183,
                    2147483647,
                    2147483647,
                    153,
                    401,
                    516

needs *all 7* of the osds from epochs 75858 through 75860 to ensure
that it has any writes completed during that time.  You can make
transient situations like that less of a problem by setting min_size
to 8 (though it'll prevent writes with 2 failures until backfill
completes).  A possible enhancement for an EC pool would be to gather
the infos from those osds anyway and use that rule out writes (if they
actually happened, you'd still be stuck).
-Sam

On Tue, Jan 10, 2017 at 5:36 AM, Craig Chi <craigchi@xxxxxxxxxxxx> wrote:
> Hi List,
>
> I am testing the stability of my Ceph cluster with power failure.
>
> I brutally powered off 2 Ceph units with each 90 OSDs on it while the client
> I/O was continuing.
>
> Since then, some of the pgs of my cluster stucked in peering
>
>       pgmap v3261136: 17408 pgs, 4 pools, 176 TB data, 5082 kobjects
>             236 TB used, 5652 TB / 5889 TB avail
>             8563455/38919024 objects degraded (22.003%)
>                13526 active+undersized+degraded
>                 3769 active+clean
>                  104 down+remapped+peering
>                    9 down+peering
>
> I queried the peering pg (all on EC pool with 7+2) and got blocked
> information (full query: http://pastebin.com/pRkaMG2h )
>
>             "probing_osds": [
>                 "153(6)",
>                 "183(3)",
>                 "345(0)",
>                 "401(7)",
>                 "516(8)",
>                 "622(1)",
>                 "685(2)"
>             ],
>             "blocked": "peering is blocked due to down osds",
>             "down_osds_we_would_probe": [
>                 792
>             ],
>             "peering_blocked_by": [
>                 {
>                     "osd": 792,
>                     "current_lost_at": 0,
>                     "comment": "starting or marking this osd lost may let us
> proceed"
>                 }
>             ]
>
>
> osd.792 is exactly on one of the units I powered off. And I think the I/O
> associated with this pg is paused too.
>
> I have checked the troubleshooting page on Ceph website (
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
> ), it says that start the OSD or mark it lost can make the procedure
> continue.
>
> I am sure that my cluster was healthy before the power outage occurred. I am
> wondering if the power outage really happens in production environment, will
> it also freeze my client I/O if I don't do anything? Since I just lost 2
> redundancies (I have erasure code with 7+2), I think it should still serve
> normal functionality.
>
> Or if I am doing something wrong? Please give me some suggestions, thanks.
>
> Sincerely,
> Craig Chi
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com