Re: troubleshooting space usage

Igor Fedotov <ifedotov@xxxxxxx> · Wed, 3 Jul 2019 14:29:33 +0300



    Hi Andrei,
    Additionally I'd like to see performance counters dump for a
      couple of HDD OSDs (obtained through 'ceph daemon osd.N perf dump'
      command).
    W.r.t average object size - I was thinking that you might know
      what objects had been uploaded... If not then you might want to
      estimate it by using "rados get" command on the pool: retrieve
      some random object set and check their sizes. But let's check
      performance counters first - most probably they will show loses
      caused by allocation.
    

    Also I've just found similar issue (still unresolved) in our
      internal tracker - but its root cause is definitely different from
      allocation overhead. Looks like some orphaned objects in the pool.
      Could you please compare and share the amounts of objects in the
      pool reported by "ceph (or rados) df detail" and radosgw tools?

    
    Thanks,
    Igor

    
    On 7/3/2019 12:56 PM, Andrei
      Mikhailovsky wrote:

    
        Hi Igor,
        

        Many thanks for your reply. Here are the details about the
          cluster:
        

        1. Ceph version - 13.2.5-1xenial (installed from Ceph
          repository for ubuntu 16.04)
        

        2. main devices for radosgw pool - hdd. we do use a few
          ssds for the other pool, but it is not used by radosgw
        

        3. we use BlueStore
        

        4. Average rgw object size - I have no idea how to check
          that. Couldn't find a simple answer from google either. Could
          you please let me know how to check that?
        

        5. Ceph osd df tree:
        

        6. Other useful info on the cluster:
        

          # ceph osd df tree
          ID  CLASS WEIGHT    REWEIGHT SIZE    USE     AVAIL   %USE
             VAR  PGS TYPE NAME
          

           -1       112.17979        - 113 TiB  90 TiB  23 TiB
            79.25 1.00   - root uk
           -5       112.17979        - 113 TiB  90 TiB  23 TiB
            79.25 1.00   -     datacenter ldex
          -11       112.17979        - 113 TiB  90 TiB  23 TiB
            79.25 1.00   -         room ldex-dc3
          -13       112.17979        - 113 TiB  90 TiB  23 TiB
            79.25 1.00   -             row row-a
           -4       112.17979        - 113 TiB  90 TiB  23 TiB
            79.25 1.00   -                 rack ldex-rack-a5
           -2        28.04495        -  28 TiB  22 TiB 6.2 TiB
            77.96 0.98   -                     host arh-ibstorage1-ib
          

            0   hdd   2.73000  0.79999 2.8 TiB 2.3 TiB 519 GiB
            81.61 1.03 145                         osd.0
            1   hdd   2.73000  1.00000 2.8 TiB 1.9 TiB 847 GiB
            70.00 0.88 130                         osd.1
          
             2   hdd   2.73000  1.00000 2.8 TiB 2.2 TiB 561 GiB
              80.12 1.01 152                         osd.2
              3   hdd   2.73000  1.00000 2.8 TiB 2.3 TiB 469 GiB
              83.41 1.05 160                         osd.3
              4   hdd   2.73000  1.00000 2.8 TiB 1.8 TiB 983 GiB
              65.18 0.82 141                         osd.4
             32   hdd   5.45999  1.00000 5.5 TiB 4.4 TiB 1.1 TiB
              80.68 1.02 306                         osd.32
             35   hdd   2.73000  1.00000 2.8 TiB 1.7 TiB 1.0 TiB
              62.89 0.79 126                         osd.35
             36   hdd   2.73000  1.00000 2.8 TiB 2.3 TiB 464 GiB
              83.58 1.05 175                         osd.36
             37   hdd   2.73000  0.89999 2.8 TiB 2.5 TiB 301 GiB
              89.34 1.13 160                         osd.37
              5   ssd   0.74500  1.00000 745 GiB 642 GiB 103 GiB
              86.15 1.09  65                         osd.5
            

             -3        28.04495        -  28 TiB  24 TiB 4.5 TiB
              84.03 1.06   -                     host arh-ibstorage2-ib
              9   hdd   2.73000  0.95000 2.8 TiB 2.4 TiB 405 GiB
              85.65 1.08 158                         osd.9
             10   hdd   2.73000  0.89999 2.8 TiB 2.4 TiB 352 GiB
              87.52 1.10 169                         osd.10
            
               11   hdd   2.73000  1.00000 2.8 TiB 2.0 TiB 783 GiB
                72.28 0.91 160                         osd.11
               12   hdd   2.73000  0.84999 2.8 TiB 2.4 TiB 359 GiB
                87.27 1.10 153                         osd.12
               13   hdd   2.73000  1.00000 2.8 TiB 2.4 TiB 348 GiB
                87.69 1.11 169                         osd.13
               14   hdd   2.73000  1.00000 2.8 TiB 2.5 TiB 283 GiB
                89.97 1.14 170                         osd.14
               15   hdd   2.73000  1.00000 2.8 TiB 2.2 TiB 560 GiB
                80.18 1.01 155                         osd.15
               16   hdd   2.73000  0.95000 2.8 TiB 2.4 TiB 332 GiB
                88.26 1.11 178                         osd.16
               26   hdd   5.45999  1.00000 5.5 TiB 4.4 TiB 1.0 TiB
                81.04 1.02 324                         osd.26
                7   ssd   0.74500  1.00000 745 GiB 607 GiB 138 GiB
                81.48 1.03  62                         osd.7
              

              -15        28.04495        -  28 TiB  22 TiB 6.4 TiB
                77.40 0.98   -                     host
                arh-ibstorage3-ib
               18   hdd   2.73000  0.95000 2.8 TiB 2.5 TiB 312 GiB
                88.96 1.12 156                         osd.18
               19   hdd   2.73000  1.00000 2.8 TiB 2.0 TiB 771 GiB
                72.68 0.92 162                         osd.19
               20   hdd   2.73000  1.00000 2.8 TiB 2.0 TiB 733 GiB
                74.04 0.93 149                         osd.20
              
                 21   hdd   2.73000  1.00000 2.8 TiB 2.2 TiB 533
                  GiB 81.12 1.02 155                         osd.21
                 22   hdd   2.73000  1.00000 2.8 TiB 2.1 TiB 692
                  GiB 75.48 0.95 144                         osd.22
                 23   hdd   2.73000  1.00000 2.8 TiB 1.6 TiB 1.1
                  TiB 58.43 0.74 130                         osd.23
                 24   hdd   2.73000  1.00000 2.8 TiB 2.2 TiB 579
                  GiB 79.51 1.00 146                         osd.24
                 25   hdd   2.73000  1.00000 2.8 TiB 1.9 TiB 886
                  GiB 68.63 0.87 147                         osd.25
                 31   hdd   5.45999  1.00000 5.5 TiB 4.7 TiB 758
                  GiB 86.50 1.09 326                         osd.31
                  6   ssd   0.74500  0.89999 744 GiB 640 GiB 104
                  GiB 86.01 1.09  61                         osd.6
                

                -17        28.04494        -  28 TiB  22 TiB 6.3
                  TiB 77.61 0.98   -                     host
                  arh-ibstorage4-ib
                  8   hdd   2.73000  1.00000 2.8 TiB 1.9 TiB 909
                  GiB 67.80 0.86 141                         osd.8
                 17   hdd   2.73000  1.00000 2.8 TiB 1.9 TiB 904
                  GiB 67.99 0.86 144                         osd.17
                 27   hdd   2.73000  1.00000 2.8 TiB 2.1 TiB 654
                  GiB 76.84 0.97 152                         osd.27
                 28   hdd   2.73000  1.00000 2.8 TiB 2.3 TiB 481
                  GiB 82.98 1.05 153                         osd.28
                
                   29   hdd   2.73000  1.00000 2.8 TiB 1.9 TiB 829
                    GiB 70.65 0.89 137                         osd.29
                   30   hdd   2.73000  1.00000 2.8 TiB 2.0 TiB 762
                    GiB 73.03 0.92 142                         osd.30
                   33   hdd   2.73000  1.00000 2.8 TiB 2.3 TiB 501
                    GiB 82.25 1.04 166                         osd.33
                   34   hdd   5.45998  1.00000 5.5 TiB 4.5 TiB 968
                    GiB 82.77 1.04 325                         osd.34
                   39   hdd   2.73000  0.95000 2.8 TiB 2.4 TiB 402
                    GiB 85.77 1.08 162                         osd.39
                   38   ssd   0.74500  1.00000 745 GiB 671 GiB  74
                    GiB 90.02 1.14  68                         osd.38
                                         TOTAL 113 TiB  90 TiB  23
                    TiB 79.25
                  MIN/MAX VAR: 0.74/1.14  STDDEV: 8.14
                
              
        # for i in $(radosgw-admin
            bucket list | jq -r '.[]'); do radosgw-admin bucket stats
            --bucket=$i | jq '.usage | ."rgw.main" | .size_kb' ; done |
            awk '{ SUM += $1} END { print SUM/1024/1024/1024 }'
        6.59098
        

        # ceph df
        
          
            GLOBAL:
                SIZE        AVAIL      RAW USED     %RAW USED
                113 TiB     23 TiB       90 TiB         79.25
          
          
          POOLS:
              NAME                           ID     USED      
             %USED     MAX AVAIL     OBJECTS
              Primary-ubuntu-1               5       27 TiB    
            87.56       3.9 TiB     7302534
              .users.uid                     15     6.8 KiB        
            0       3.9 TiB          39
              .users                         16       335 B        
            0       3.9 TiB          20
              .users.swift                   17        14 B        
            0       3.9 TiB           1
             
                .rgw.buckets                   19      15 TiB     79.88
                      3.9 TiB     8787763
              .users.email                   22         0 B        
            0       3.9 TiB           0
              .log                           24     109 MiB        
            0       3.9 TiB      102301
              .rgw.buckets.extra             37         0 B        
            0       2.6 TiB           0
              .rgw.root                      44     2.9 KiB        
            0       2.6 TiB          16
              .rgw.meta                      45     1.7 MiB        
            0       2.6 TiB        6249
              .rgw.control                   46         0 B        
            0       2.6 TiB           8
              .rgw.gc                        47         0 B        
            0       2.6 TiB          32
              .usage                         52         0 B        
            0       2.6 TiB           0
              .intent-log                    53         0 B        
            0       2.6 TiB           0
              default.rgw.buckets.non-ec     54         0 B        
            0       2.6 TiB           0
              .rgw.buckets.index             55         0 B        
            0       2.6 TiB       11485
              .rgw                           56     491 KiB        
            0       2.6 TiB        1686
              Primary-ubuntu-1-ssd           57     1.2 TiB    
            92.39       105 GiB      379516
          

          I am not too sure if the issue relates to the BlueStore
            overhead as I would probably have seen the discrepancy in my
            Primary-ubuntu-1 pool as well. However, the data usage on
            Primary-ubuntu-1 pool seems to be consistent with my
            expectations (precise numbers to be verified soon). The
            issues seems to be only with the .rgw-buckets pool where the
            "ceph df " output shows 15TB of usage and the sum of all
            buckets in that pool shows just over 6.5TB.
          

          Cheers
          

          Andrei
          

          From:
            "Igor Fedotov" <ifedotov@xxxxxxx>

            To: "andrei" <andrei@xxxxxxxxxx>, "ceph-users"
            <ceph-users@xxxxxxxxxxxxxx>

            Sent: Tuesday, 2 July, 2019 10:58:54

            Subject: Re:  troubleshooting space
            usage

          
            Hi Andrei,
            The most obvious reason is space usage overhead caused by
              BlueStore allocation granularity, e.g. if
              bluestore_min_alloc_size is 64K  and average object size
              is 16K one will waste 48K per object in average. This is
              rather a speculation so far as we lack key the information
              about your cluster:
            - Ceph version
            - What are the main devices for OSD: hdd or ssd.
            - BlueStore or FileStore.
            - average RGW object size.

            
            You might also want to collect and share performance
              counter dumps (ceph daemon osd.N perf dump) and "
          
          
            " reports from a couple of your OSDs.
            

            Thanks,
            Igor

            
            On 7/2/2019 11:43 AM, Andrei
              Mikhailovsky wrote:

            
                Bump!
                

                  From:
                    "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>

                    To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>

                    Sent: Friday, 28 June, 2019 14:54:53

                    Subject:  troubleshooting space
                    usage

                  
                      Hi
                      

                      Could someone please explain / show how to
                        troubleshoot the space usage in Ceph and how to
                        reclaim the unused space?
                      

                      I have a small cluster with 40 osds, replica
                        of 2, mainly used as a backend for cloud stack
                        as well as the S3 gateway. The used space
                        doesn't make any sense to me, especially the rgw
                        pool, so I am seeking help.
                      

                      Here is what I found from the client: 
                      

                      Ceph -s shows the
                      

                       usage:   89 TiB used, 24 TiB / 113 TiB avail

                      
                      Ceph df shows:
                      

                      Primary-ubuntu-1               5       27 TiB
                            90.11       3.0 TiB     7201098
                      Primary-ubuntu-1-ssd           57     1.2 TiB
                            89.62       143 GiB      359260   
                      .rgw.buckets  
                                          19      15 TiB     83.73      
                          3.0 TiB     8742222
                      

                      the usage of
                          the Primary-ubuntu-1 and Primary-ubuntu-1-ssd
                          is in line with my expectations.
                        However, the .rgw.buckets pool seems to be using
                        way too much. The usage of all rgw buckets shows
                        6.5TB usage (looking at the size_kb values from
                        the "radosgw-admin bucket stats"). I am trying
                        to figure out why .rgw.buckets is using 15TB of
                        space instead of the 6.5TB as shown from the
                        bucket usage. 
                      

                      Thanks
                      

                      Andrei
                    
                    
                    _______________________________________________

                    ceph-users mailing list

                    ceph-users@xxxxxxxxxxxxxx

                    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                  
              _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

            
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com