Re: OSDs are not utilized evenly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Joseph

replying to autoscaler question - no I don't use it.


On 11/4/22 22:45, Joseph Mundackal wrote:
Hi Denis,

can you share the following data points?

ceph osd df tree (to see how the osd's are distributed)
ceph osd crush rule dump (to see what your ec rule looks like)
ceph osd pool ls detail (to see the pools and pools to crush rule mapping and pg nums)

Also
    "optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect
is the auto scaler currently adjusting your pg counts?

-Joseph

On Wed, Nov 2, 2022 at 5:01 PM Denis Polom <denispolom@xxxxxxxxx> wrote:

    Hi Joseph,

    thank you for answer. But if I'm looking correctly to 'ceph osd
    df' output I posted I see there are about 195 PGs per OSD.

    There are 608 OSDs in the pool, which is the only data pool. What
    I have calculated - PG calc says that PG number is fine.


    On 11/1/22 14:03, Joseph Mundackal wrote:
    If the GB per pg is high, the balancer module won't be able to help.

    Your pg count per osd also looks low (30's), so increasing pgs
    per pool would help with both problems.

    You can use the pg calculator to determine which pools need what

    On Tue, Nov 1, 2022, 08:46 Denis Polom <denispolom@xxxxxxxxx> wrote:

        Hi

        I observed on my Ceph cluster running latest Pacific that
        same size OSDs
        are utilized differently even if balancer is running and
        reports status
        as perfectly balanced.

        {
             "active": true,
             "last_optimize_duration": "0:00:00.622467",
             "last_optimize_started": "Tue Nov  1 12:49:36 2022",
             "mode": "upmap",
             "optimize_result": "Unable to find further optimization,
        or pool(s)
        pg_num is decreasing, or distribution is already perfect",
             "plans": []
        }

        balancer settings for upmap are:

           mgr           advanced
        mgr/balancer/mode                               upmap
           mgr           advanced
        mgr/balancer/upmap_max_deviation                1
           mgr           advanced
        mgr/balancer/upmap_max_optimizations            20

        It's obvious that utilization is not same (difference is
        about 1TB) from
        command `ceph osd df`. Following is just a partial output:

        ID   CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA OMAP
        META     AVAIL    %USE   VAR   PGS  STATUS
           0    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        3.0 MiB
        37 GiB  3.6 TiB  78.09  1.05  196      up
        124    hdd  18.00020   1.00000   16 TiB   12 TiB   12
        TiB       0 B   32
        GiB  4.7 TiB  71.20  0.96  195      up
        157    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        5.3 MiB   35
        GiB  3.7 TiB  77.67  1.05  195      up
           1    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        2.0 MiB
        35 GiB  3.7 TiB  77.69  1.05  195      up
        243    hdd  18.00020   1.00000   16 TiB   12 TiB   12
        TiB       0 B   31
        GiB  4.7 TiB  71.16  0.96  195      up
        244    hdd  18.00020   1.00000   16 TiB   12 TiB   12
        TiB       0 B   31
        GiB  4.7 TiB  71.19  0.96  195      up
        245    hdd  18.00020   1.00000   16 TiB   12 TiB   12
        TiB       0 B   32
        GiB  4.7 TiB  71.55  0.96  196      up
        246    hdd  18.00020   1.00000   16 TiB   12 TiB   12
        TiB       0 B   31
        GiB  4.7 TiB  71.17  0.96  195      up
        249    hdd  18.00020   1.00000   16 TiB   12 TiB   12
        TiB       0 B   30
        GiB  4.7 TiB  71.18  0.96  195      up
        500    hdd  18.00020   1.00000   16 TiB   12 TiB   12
        TiB       0 B   30
        GiB  4.7 TiB  71.19  0.96  195      up
        501    hdd  18.00020   1.00000   16 TiB   12 TiB   12
        TiB       0 B   31
        GiB  4.7 TiB  71.57  0.96  196      up
        502    hdd  18.00020   1.00000   16 TiB   12 TiB   12
        TiB       0 B   31
        GiB  4.7 TiB  71.18  0.96  195      up
        532    hdd  18.00020   1.00000   16 TiB   12 TiB   12
        TiB       0 B   31
        GiB  4.7 TiB  71.16  0.96  195      up
        549    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        576 KiB   36
        GiB  3.7 TiB  77.70  1.05  195      up
        550    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        3.8 MiB   36
        GiB  3.7 TiB  77.67  1.05  195      up
        551    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        2.4 MiB   35
        GiB  3.7 TiB  77.68  1.05  195      up
        552    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        5.5 MiB   35
        GiB  3.7 TiB  77.69  1.05  195      up
        553    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        5.1 MiB   37
        GiB  3.6 TiB  77.71  1.05  195      up
        554    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        967 KiB   36
        GiB  3.6 TiB  77.71  1.05  195      up
        555    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        1.3 MiB   36
        GiB  3.6 TiB  78.08  1.05  196      up
        556    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        4.7 MiB   36
        GiB  3.6 TiB  78.10  1.05  196      up
        557    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        2.4 MiB   36
        GiB  3.7 TiB  77.69  1.05  195      up
        558    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        4.5 MiB   36
        GiB  3.6 TiB  77.72  1.05  195      up
        559    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        1.5 MiB   35
        GiB  3.6 TiB  78.09  1.05  196      up
        560    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        5.2 MiB   35
        GiB  3.7 TiB  77.69  1.05  195      up
        561    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        2.8 MiB   35
        GiB  3.7 TiB  77.69  1.05  195      up
        562    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        1.0 MiB   36
        GiB  3.7 TiB  77.68  1.05  195      up
        563    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        2.6 MiB   36
        GiB  3.7 TiB  77.68  1.05  195      up
        564    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        5.1 MiB   36
        GiB  3.6 TiB  78.09  1.05  196      up
        567    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        4.8 MiB   36
        GiB  3.6 TiB  78.11  1.05  196      up
        568    hdd  18.00020   1.00000   16 TiB   13 TiB   13 TiB  
        5.2 MiB   35
        GiB  3.7 TiB  77.68  1.05  195      up

        All OSDs are used by the same pool (EC)

        I have the same issue on another Ceph cluster with the same
        setup where
        I was able to make OSDs utilization same by changing reweight
        from
        1.00000  to lower on OSDs with higher utilization and I got a
        lot of
        free space:

        before changing reweight:

        --- RAW STORAGE ---
        CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
        hdd    3.1 PiB  510 TiB  2.6 PiB   2.6 PiB      83.77
        ssd    2.6 TiB  2.6 TiB   46 GiB    46 GiB       1.70
        TOTAL  3.1 PiB  513 TiB  2.6 PiB   2.6 PiB      83.70

        --- POOLS ---
        POOL                   ID   PGS   STORED  OBJECTS  USED 
        %USED  MAX AVAIL
        cephfs_data             3  8192  2.1 PiB  555.63M  2.6 PiB 
        91.02    216 TiB
        cephfs_metadata         4   128  7.5 GiB  140.22k   22 GiB 
         0.87    851 GiB
        device_health_metrics   5     1  4.1 GiB    1.15k  8.3 GiB   
          0    130 TiB


        after changing reweight:
        --- RAW STORAGE ---
        CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
        hdd    3.1 PiB  522 TiB  2.6 PiB   2.6 PiB      83.38
        ssd    2.6 TiB  2.6 TiB   63 GiB    63 GiB       2.36
        TOTAL  3.1 PiB  525 TiB  2.6 PiB   2.6 PiB      83.31

        --- POOLS ---
        POOL                   ID   PGS   STORED  OBJECTS  USED 
        %USED  MAX AVAIL
        cephfs_data             3  8192  2.1 PiB  555.63M  2.5 PiB 
        86.83    330 TiB
        cephfs_metadata         4   128  7.4 GiB  140.22k   22 GiB 
         0.87    846 GiB
        device_health_metrics   5     1  4.2 GiB    1.15k  8.4 GiB   
          0    198 TiB

        Free space I got is almost 5% what is about 100TB!

        This is just workaround and I'm not happy with keeping
        reweight with not
        default value permanently.

        Do you have any advice please, what settings can be adjusted
        or should
        be adjusted to keep OSDs utilization same? Because obviously
        balancer
        upmap, not even crush-compat are working correctly at least
        in my case.

        Many thanks!







        _______________________________________________
        ceph-users mailing list -- ceph-users@xxxxxxx
        To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux