Re: Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ceph doesn't delete a copy if it can't find a new place to store it at, this is a good thing.
Use one more server to see the data actually moving elsewhere (without a health warning in Nautilus, with a health warning in older versions)


It's a little bit unfortunate that "ceph osd df" lies about the usage of out OSDs: they go to 0 immediately; this used to work different in pre-Luminous (or was it pre-BlueStore?)


Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Sun, Jun 9, 2019 at 2:38 PM Tarek Zegar <tzegar@xxxxxxxxxx> wrote:

Hi Haung,

So you are suggesting that even though osd.4 in this case has weight 0, it's still getting new data being written to it? I find that counter to what weight 0 means.

Thanks
Tarek



Inactive hide details for huang jun ---06/08/2019 05:27:45 AM---i think the write data will also write to the osd.4 in this cashuang jun ---06/08/2019 05:27:45 AM---i think the write data will also write to the osd.4 in this case. bc your osd.4 is not down, so the

From: huang jun <hjwsm1989@xxxxxxxxx>
To: Tarek Zegar <tzegar@xxxxxxxxxx>
Cc: Paul Emmerich <paul.emmerich@xxxxxxxx>, Ceph Users <ceph-users@xxxxxxxxxxxxxx>
Date: 06/08/2019 05:27 AM
Subject: [EXTERNAL] Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size





i think the write data will also write to the osd.4 in this case.
bc your osd.4 is not down, so the ceph don't think the pg have some osd down,
and it will replicated the data to all osds in actingbackfill set.

Tarek Zegar <tzegar@xxxxxxxxxx> 于2019年6月7日周五 下午10:37写道:
    Paul / All

    I'm not sure what warning your are referring to, I'm on Nautilus. The point I'm getting at is if you weight out all OSD on a host with a cluster of 3 OSD hosts with 3 OSD each, crush rule = host, then write to the cluster, it *should* imo not just say remapped but undersized / degraded.


    See below, 1 out of the 3 OSD hosts has ALL it's OSD marked out and weight = 0. When you write (say using FIO), the PGs *only* have 2 OSD in them (UP set), which is pool min size. I don't understand why it's not saying undersized/degraded, this seems like a bug. Who cares that the Acting Set has the 3 original OSD in it, the actual data is only on 2 OSD, which is a degraded state


    root@hostadmin:~# ceph -s

    cluster:
    id: 33d41932-9df2-40ba-8e16-8dedaa4b3ef6
    health: HEALTH_WARN
    application not enabled on 1 pool(s)


    services:
    mon: 1 daemons, quorum hostmonitor1 (age 29m)
    mgr: hostmonitor1(active, since 31m)
    osd: 9 osds: 9 up, 6 in; 100 remapped pgs


    data:
    pools: 1 pools, 100 pgs
    objects: 520 objects, 2.0 GiB
    usage: 15 GiB used, 75 GiB / 90 GiB avail
    pgs: 520/1560 objects misplaced (33.333%)

    100 active+clean+remapped


    root@hostadmin:~# ceph osd tree

    ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
    -1 0.08817 root default
    -3 0.02939 host hostosd1
    0 hdd 0.00980 osd.0 up 1.00000 1.00000
    3 hdd 0.00980 osd.3 up 1.00000 1.00000
    6 hdd 0.00980 osd.6 up 1.00000 1.00000

    -5 0.02939 host hostosd2
    1 hdd 0.00980 osd.1 up 0 1.00000
    4 hdd 0.00980 osd.4 up 0 1.00000
    7 hdd 0.00980 osd.7 up 0 1.00000

    -7 0.02939 host hostosd3
    2 hdd 0.00980 osd.2 up 1.00000 1.00000
    5 hdd 0.00980 osd.5 up 1.00000 1.00000
    8 hdd 0.00980 osd.8 up 1.00000 1.00000



    root@hostadmin:~# ceph osd df

    ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
    0 hdd 0.00980 1.00000 10 GiB 1.7 GiB 765 MiB 12 KiB 1024 MiB 8.2 GiB 17.48 1.03 34 up
    3 hdd 0.00980 1.00000 10 GiB 1.7 GiB 765 MiB 12 KiB 1024 MiB 8.2 GiB 17.48 1.03 36 up
    6 hdd 0.00980 1.00000 10 GiB 1.6 GiB 593 MiB 4 KiB 1024 MiB 8.4 GiB 15.80 0.93 30 up

    1 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up
    4 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up
    7 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 100 up

    2 hdd 0.00980 1.00000 10 GiB 1.5 GiB 525 MiB 8 KiB 1024 MiB 8.5 GiB 15.13 0.89 20 up
    5 hdd 0.00980 1.00000 10 GiB 1.9 GiB 941 MiB 4 KiB 1024 MiB 8.1 GiB 19.20 1.13 43 up
    8 hdd 0.00980 1.00000 10 GiB 1.6 GiB 657 MiB 8 KiB 1024 MiB 8.4 GiB 16.42 0.97 37 up
    TOTAL 90 GiB 15 GiB 6.2 GiB 61 KiB 9.0 GiB 75 GiB 16.92
    MIN/MAX VAR: 0.89/1.13 STDDEV: 1.32

    Tarek Zegar

    Senior SDS Engineer

    Email tzegar@xxxxxxxxxx
    Mobile
    630.974.7172




    Inactive hide details for Paul Emmerich ---06/07/2019 05:25:23 AM---remapped no longer triggers a health warning in nautilus. YPaul Emmerich ---06/07/2019 05:25:23 AM---remapped no longer triggers a health warning in nautilus. Your data is still there, it's just on the

    From:
    Paul Emmerich <paul.emmerich@xxxxxxxx>
    To:
    Tarek Zegar <tzegar@xxxxxxxxxx>
    Cc:
    Ceph Users <ceph-users@xxxxxxxxxxxxxx>
    Date:
    06/07/2019 05:25 AM
    Subject:
    [EXTERNAL] Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size




    remapped no longer triggers a health warning in nautilus.

    Your data is still there, it's just on the wrong OSD if that OSD is still up and running.


    Paul

    --
    Paul Emmerich

    Looking for help with your Ceph cluster? Contact us at https://croit.io

    croit GmbH
    Freseniusstr. 31h
    81247 München
    www.croit.io
    Tel: +49 89 1896585 90


    On Thu, Jun 6, 2019 at 10:48 PM Tarek Zegar <tzegar@xxxxxxxxxx> wrote:
        For testing purposes I set a bunch of OSD to 0 weight, this correctly forces Ceph to not use said OSD. I took enough out such that the UP set only had Pool min size # of OSD (i.e 2 OSD).

        Two Questions:
        1. Why doesn't the acting set eventually match the UP set and simply point to [6,5] only
        2. Why are none of the PGs marked as undersized and degraded? The data is only hosted on 2 OSD rather then Pool size (3), I would expect a undersized warning and degraded for PG with data?

        Example PG:
        PG 1.4d active+clean+remapped UP= [6,5] Acting = [6,5,4]

        OSD Tree:
        ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
        -1 0.08817 root default
        -3 0.02939 host hostosd1
        0 hdd 0.00980 osd.0 up 1.00000 1.00000
        3 hdd 0.00980 osd.3 up 1.00000 1.00000
        6 hdd 0.00980 osd.6 up 1.00000 1.00000
        -5 0.02939 host hostosd2
        1 hdd 0.00980 osd.1 up 0 1.00000
        4 hdd 0.00980 osd.4 up 0 1.00000
        7 hdd 0.00980 osd.7 up 0 1.00000
        -7 0.02939 host hostosd3
        2 hdd 0.00980 osd.2 up 1.00000 1.00000
        5 hdd 0.00980 osd.5 up 1.00000 1.00000
        8 hdd 0.00980 osd.8 up 0 1.00000





        _______________________________________________
        ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx
        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Thank you!
HuangJun


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux