Re: Lost 1/40 OSDs at EC 4+1, now PGs are incomplete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes if you set back to 5, every time your loose an OSD your have to set to 4 and let the rebuild take place before putting back to 5.

I guess is all down to how important 100% up time is over you manually monitoring the back fill / fix the OSD / replace the OSD by dropping to 4 vs letting it do this automatically and risk a further OSD loss.

If you have the space ID suggest going to 4 + 2 and migrating your data, this would remove the ongoing issue and give you some extra data protection from OSD loss.

On Wed, Dec 12, 2018 at 11:43 AM David Young <funkypenguin@xxxxxxxxxxxxxx> wrote:
(accidentally forgot to reply to the list)

Thank you, setting min_size to 4 allowed I/O again, and the 39 incomplete PGs are now:

39  active+undersized+degraded+remapped+backfilling

Once backfilling is done, I'll increase min_size to 5 again.

Am I likely to encounter this issue whenever I loose an OSD (I/O freezes and manually reducing size is required), and is there anything I should be doing differently?

Thanks again!
D



Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, December 12, 2018 3:31 PM, Ashley Merrick <singapore@xxxxxxxxxxxxxx> wrote:

With EC the min size is set to K + 1.

Generally EC is used with a M of 2 or more, reason min size is set to 1 is now you are in a state when a further OSD loss will cause some PG’s to not have at least K size available as you only have 1 extra M.

As per the error you can get your pool back online by setting min_size to 4.

However this would only be a temp fix while you get the OSD back online / rebuilt so you can go back to your 4 + 1 state.

,Ash

On Wed, 12 Dec 2018 at 10:27 AM, David Young <funkypenguin@xxxxxxxxxxxxxx> wrote:
Hi all,

I have a small 2-node cluster with 40 OSDs, using erasure coding 4+1

I lost osd38, and now I have 39 incomplete PGs.

---
PG_AVAILABILITY Reduced data availability: 39 pgs inactive, 39 pgs incomplete
    pg 22.2 is incomplete, acting [19,33,10,8,29] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
    pg 22.f is incomplete, acting [17,9,23,14,15] (reducing pool media  from 5 may help; search ceph.com/docs for 'incomplete')
    pg 22.12 is incomplete, acting [7,33,10,31,29] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
    pg 22.13 is incomplete, acting [23,0,15,33,13] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
    pg 22.23 is incomplete, acting [29,17,18,15,12] (reducing pool media min_size from 5 may help; search ceph.com/docs for 'incomplete')
<snip>
---

My EC profile is below:

---
root@prod1:~# ceph osd erasure-code-profile get ec-41-profile
crush-device-class=
crush-failure-domain=osd
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=1
plugin=jerasure
technique=reed_sol_van
w=8
---

When I query one of the incomplete PGs, I see this:

---
    "recovery_state": [
        {
            "name": "Started/Primary/Peering/Incomplete",
            "enter_time": "2018-12-11 20:46:11.645796",
            "comment": "not enough complete instances of this PG"
        },
---

And this:

---
            "probing_osds": [
                "0(4)",
                "7(2)",
                "9(1)",
                "11(4)",
                "22(3)",
                "29(2)",
                "36(0)"
            ],
            "down_osds_we_would_probe": [
                38
            ],
            "peering_blocked_by": []
        },
---

I have set this in /etc/ceph/ceph.conf to no effect:
   osd_find_best_info_ignore_history_les = true


As a result of the incomplete PGs, I/O is currently frozen to at last part of my cephfs.

I expected to be able to tolerate the loss of an OSD without issue, is there anything I can do to restore these incomplete PGs?

When I bring back a new osd38, I see:
---
            "probing_osds": [
                "4(2)",
                "11(3)",
                "22(1)",
                "24(1)",
                "26(2)",
                "36(4)",
                "38(1)",
                "39(0)"
            ],
            "down_osds_we_would_probe": [],
            "peering_blocked_by": []
        },
        {
            "name": "Started",
            "enter_time": "2018-12-11 21:06:35.307379"
        }
---

But my recovery state is still:

---
    "recovery_state": [
        {
            "name": "Started/Primary/Peering/Incomplete",
            "enter_time": "2018-12-11 21:06:35.320292",
            "comment": "not enough complete instances of this PG"
        },
---

Any ideas?

Thanks!
D

_______________________________________________
ceph-users mailing list


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux