Re: Lost 1/40 OSDs at EC 4+1, now PGs are incomplete

Ashley Merrick <singapore@xxxxxxxxxxxxxx> · Wed, 12 Dec 2018 11:50:40 +0800

Yes if you set back to 5, every time your loose an OSD your have to set to 4 and let the rebuild take place before putting back to 5.

I guess is all down to how important 100% up time is over you manually monitoring the back fill / fix the OSD / replace the OSD by dropping to 4 vs letting it do this automatically and risk a further OSD loss.

If you have the space ID suggest going to 4 + 2 and migrating your data, this would remove the ongoing issue and give you some extra data protection from OSD loss.

On Wed, Dec 12, 2018 at 11:43 AM David Young <funkypenguin@xxxxxxxxxxxxxx> wrote:
 (accidentally forgot to reply to the list)

Thank you, setting min_size to 4 allowed I/O again, and the 39 incomplete PGs are now:

39  active+undersized+degraded+remapped+backfilling

Once backfilling is done, I'll increase min_size to 5 again.

Am I likely to encounter this issue whenever I loose an OSD (I/O freezes and manually reducing size is required), and is there anything I should be doing differently?

Thanks again!
D

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, December 12, 2018 3:31 PM, Ashley Merrick <singapore@xxxxxxxxxxxxxx> wrote:

With EC the min size is set to K + 1.

Generally EC is used with a M of 2 or more, reason min size is set to 1 is now you are in a state when a further OSD loss will cause some PG’s to not have at least K size available as you only have 1 extra M.

As per the error you can get your pool back online by setting min_size to 4.

However this would only be a temp fix while you get the OSD back online / rebuilt so you can go back to your 4 + 1 state.

,Ash

On Wed, 12 Dec 2018 at 10:27 AM, David Young <funkypenguin@xxxxxxxxxxxxxx> wrote:
Hi all,

 I have a small 2-node
          cluster with 40 OSDs, using erasure coding
            4+1

 I lost osd38, and now
              I have 39 incomplete PGs. 

 ---
 PG_AVAILABILITY Reduced data
                      availability: 39 pgs inactive, 39 pgs incomplete
     pg 22.2 is incomplete, acting [19,33,10,8,29]
                      (reducing pool media min_size from 5 may help;
                      search ceph.com/docs for 'incomplete')
     pg 22.f is incomplete, acting [17,9,23,14,15]
                      (reducing pool media  from 5 may help;
                      search ceph.com/docs for 'incomplete')
     pg 22.12 is incomplete, acting [7,33,10,31,29]
                      (reducing pool media min_size from 5 may help;
                      search ceph.com/docs for 'incomplete')
     pg 22.13 is incomplete, acting [23,0,15,33,13]
                      (reducing pool media min_size from 5 may help;
                      search ceph.com/docs for 'incomplete')
     pg 22.23 is incomplete, acting
                      [29,17,18,15,12] (reducing pool media min_size
                      from 5 may help; search ceph.com/docs for
                      'incomplete')
 <snip>
 ---

 My EC profile is below:

 ---
 root@prod1:~# ceph
                                  osd erasure-code-profile get
                                  ec-41-profile
 crush-device-class=
 crush-failure-domain=osd
 crush-root=default
 jerasure-per-chunk-alignment=false
 k=4
 m=1
 plugin=jerasure
 technique=reed_sol_van
 w=8
 ---

 When I query one
                                      of the incomplete
                                        PGs, I see this:

 ---

                                            "recovery_state": [
         {
             "name":
                                            "Started/Primary/Peering/Incomplete",
             "enter_time":
                                            "2018-12-11
                                            20:46:11.645796",
             "comment": "not
                                            enough complete instances of
                                            this PG"
         },
 ---

 And
                                                this:

 ---

                                                    "probing_osds": [

                                                    "0(4)",

                                                    "7(2)",

                                                    "9(1)",

                                                    "11(4)",

                                                    "22(3)",

                                                    "29(2)",

                                                    "36(0)"
             ],

                                                    "down_osds_we_would_probe":
                                                    [
                 38
             ],

                                                    "peering_blocked_by":
                                                    []
         },
 ---

 I
                                                        have set this in
/etc/ceph/ceph.conf to no effect:

osd_find_best_info_ignore_history_les = true

 As
                                                          a result of the
                                                          incomplete
                                                          PGs, I/O is
                                                          currently frozen
                                                          to at last
                                                          part of my
                                                          cephfs.

 I
                                                          expected to be
                                                          able to
                                                          tolerate the
                                                          loss of an OSD
                                                          without issue,
                                                          is there
                                                          anything I can
                                                          do to restore
                                                          these
                                                          incomplete
                                                          PGs?

When I bring back a new osd38, I see:
---
            "probing_osds": [
                "4(2)",
                "11(3)",
                "22(1)",
                "24(1)",
                "26(2)",
                "36(4)",
                "38(1)",
                "39(0)"
            ],
            "down_osds_we_would_probe": [],
            "peering_blocked_by": []
        },
        {
            "name": "Started",
            "enter_time": "2018-12-11 21:06:35.307379"
        }
---

But my recovery state is still:

---
    "recovery_state": [
        {
            "name": "Started/Primary/Peering/Incomplete",
            "enter_time": "2018-12-11 21:06:35.320292",
            "comment": "not enough complete instances of this PG"
        },
---

Any ideas?

Thanks!
 D

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com