Re: troubleshooting space usage

Andrei Mikhailovsky <andrei@xxxxxxxxxx> · Thu, 4 Jul 2019 13:29:11 +0100 (BST)

Thanks for trying to help, Igor.

From: "Igor Fedotov" <ifedotov@xxxxxxx>
To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Thursday, 4 July, 2019 12:52:16
Subject: Re:  troubleshooting space usage
Yep, this looks fine..
    hmm... sorry, but I'm out of ideas what's happening.. 

    Anyway I think ceph  reports are more trustworthy than rgw ones.
      Looks like some issue with rgw reporting or may be some object
      leakage.

    Regards,
    Igor

    On 7/3/2019 6:34 PM, Andrei
      Mikhailovsky wrote:

        Hi Igor. 

        The numbers are identical it
          seems:

            .rgw.buckets                
            19      15 TiB     78.22       4.3 TiB     8786934

        # cat
            /root/ceph-rgw.buckets-rados-ls-all |wc -l
        8786934

        Cheers

          From:
            "Igor Fedotov" <ifedotov@xxxxxxx>

            To: "andrei" <andrei@xxxxxxxxxx>

            Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>

            Sent: Wednesday, 3 July, 2019 13:49:02

            Subject: Re:  troubleshooting space
            usage

            Looks fine - comparing bluestore_allocated vs.
              bluestore_stored shows a little difference. So that's not
              the allocation overhead.
            What's about comparing object counts reported by ceph and
              radosgw tools?

            Igor.

            On 7/3/2019 3:25 PM, Andrei
              Mikhailovsky wrote:

                Thanks Igor, Here is a link to the ceph perf data
                  on several osds.

                https://paste.ee/p/IzDMy

                In terms of the object sizes. We use rgw to backup
                  the data from various workstations and servers. So,
                  the sizes would be from a few kb to a few gig per
                  individual file.

                Cheers

                  From:
                    "Igor Fedotov" <ifedotov@xxxxxxx>

                    To: "andrei" <andrei@xxxxxxxxxx>

                    Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>

                    Sent: Wednesday, 3 July, 2019 12:29:33

                    Subject: Re:  troubleshooting
                    space usage

                    Hi Andrei,
                    Additionally I'd like to see performance counters
                      dump for a couple of HDD OSDs (obtained through
                      'ceph daemon osd.N perf dump' command).
                    W.r.t average object size - I was thinking that
                      you might know what objects had been uploaded...
                      If not then you might want to estimate it by using
                      "rados get" command on the pool: retrieve some
                      random object set and check their sizes. But let's
                      check performance counters first - most probably
                      they will show loses caused by allocation.

                    Also I've just found similar issue (still
                      unresolved) in our internal tracker - but its root
                      cause is definitely different from allocation
                      overhead. Looks like some orphaned objects in the
                      pool. Could you please compare and share the
                      amounts of objects in the pool reported by "ceph
                      (or rados) df detail" and radosgw tools?

                    Thanks,
                    Igor

                    On 7/3/2019 12:56 PM,
                      Andrei Mikhailovsky wrote:

                        Hi Igor,

                        Many thanks for your reply. Here are the
                          details about the cluster:

                        1. Ceph version - 13.2.5-1xenial (installed
                          from Ceph repository for ubuntu 16.04)

                        2. main devices for radosgw pool - hdd. we
                          do use a few ssds for the other pool, but it
                          is not used by radosgw

                        3. we use BlueStore

                        4. Average rgw object size - I have no idea
                          how to check that. Couldn't find a simple
                          answer from google either. Could you please
                          let me know how to check that?

                        5. Ceph osd df tree:

                        6. Other useful info on the cluster:

                          # ceph osd df tree
                          ID  CLASS WEIGHT    REWEIGHT SIZE    USE
                                AVAIL   %USE  VAR  PGS TYPE NAME

                           -1       112.17979        - 113 TiB  90
                            TiB  23 TiB 79.25 1.00   - root uk
                           -5       112.17979        - 113 TiB  90
                            TiB  23 TiB 79.25 1.00   -     datacenter
                            ldex
                          -11       112.17979        - 113 TiB  90
                            TiB  23 TiB 79.25 1.00   -         room
                            ldex-dc3
                          -13       112.17979        - 113 TiB  90
                            TiB  23 TiB 79.25 1.00   -             row
                            row-a
                           -4       112.17979        - 113 TiB  90
                            TiB  23 TiB 79.25 1.00   -                
                            rack ldex-rack-a5
                           -2        28.04495        -  28 TiB  22
                            TiB 6.2 TiB 77.96 0.98   -                  
                              host arh-ibstorage1-ib

                            0   hdd   2.73000  0.79999 2.8 TiB 2.3
                            TiB 519 GiB 81.61 1.03 145                  
                                  osd.0
                            1   hdd   2.73000  1.00000 2.8 TiB 1.9
                            TiB 847 GiB 70.00 0.88 130                  
                                  osd.1

                             2   hdd   2.73000  1.00000 2.8 TiB 2.2
                              TiB 561 GiB 80.12 1.01 152                
                                      osd.2
                              3   hdd   2.73000  1.00000 2.8 TiB
                              2.3 TiB 469 GiB 83.41 1.05 160            
                                          osd.3
                              4   hdd   2.73000  1.00000 2.8 TiB
                              1.8 TiB 983 GiB 65.18 0.82 141            
                                          osd.4
                             32   hdd   5.45999  1.00000 5.5 TiB
                              4.4 TiB 1.1 TiB 80.68 1.02 306            
                                          osd.32
                             35   hdd   2.73000  1.00000 2.8 TiB
                              1.7 TiB 1.0 TiB 62.89 0.79 126            
                                          osd.35
                             36   hdd   2.73000  1.00000 2.8 TiB
                              2.3 TiB 464 GiB 83.58 1.05 175            
                                          osd.36
                             37   hdd   2.73000  0.89999 2.8 TiB
                              2.5 TiB 301 GiB 89.34 1.13 160            
                                          osd.37
                              5   ssd   0.74500  1.00000 745 GiB
                              642 GiB 103 GiB 86.15 1.09  65            
                                          osd.5

                             -3        28.04495        -  28 TiB
                               24 TiB 4.5 TiB 84.03 1.06   -            
                                      host arh-ibstorage2-ib
                              9   hdd   2.73000  0.95000 2.8 TiB
                              2.4 TiB 405 GiB 85.65 1.08 158            
                                          osd.9
                             10   hdd   2.73000  0.89999 2.8 TiB
                              2.4 TiB 352 GiB 87.52 1.10 169            
                                          osd.10

                               11   hdd   2.73000  1.00000 2.8 TiB
                                2.0 TiB 783 GiB 72.28 0.91 160          
                                              osd.11
                               12   hdd   2.73000  0.84999 2.8 TiB
                                2.4 TiB 359 GiB 87.27 1.10 153          
                                              osd.12
                               13   hdd   2.73000  1.00000 2.8 TiB
                                2.4 TiB 348 GiB 87.69 1.11 169          
                                              osd.13
                               14   hdd   2.73000  1.00000 2.8 TiB
                                2.5 TiB 283 GiB 89.97 1.14 170          
                                              osd.14
                               15   hdd   2.73000  1.00000 2.8 TiB
                                2.2 TiB 560 GiB 80.18 1.01 155          
                                              osd.15
                               16   hdd   2.73000  0.95000 2.8 TiB
                                2.4 TiB 332 GiB 88.26 1.11 178          
                                              osd.16
                               26   hdd   5.45999  1.00000 5.5 TiB
                                4.4 TiB 1.0 TiB 81.04 1.02 324          
                                              osd.26
                                7   ssd   0.74500  1.00000 745 GiB
                                607 GiB 138 GiB 81.48 1.03  62          
                                              osd.7

                              -15        28.04495        -  28 TiB
                                 22 TiB 6.4 TiB 77.40 0.98   -          
                                          host arh-ibstorage3-ib
                               18   hdd   2.73000  0.95000 2.8 TiB
                                2.5 TiB 312 GiB 88.96 1.12 156          
                                              osd.18
                               19   hdd   2.73000  1.00000 2.8 TiB
                                2.0 TiB 771 GiB 72.68 0.92 162          
                                              osd.19
                               20   hdd   2.73000  1.00000 2.8 TiB
                                2.0 TiB 733 GiB 74.04 0.93 149          
                                              osd.20

                                 21   hdd   2.73000  1.00000 2.8
                                  TiB 2.2 TiB 533 GiB 81.12 1.02 155    
                                                      osd.21
                                 22   hdd   2.73000  1.00000 2.8
                                  TiB 2.1 TiB 692 GiB 75.48 0.95 144    
                                                      osd.22
                                 23   hdd   2.73000  1.00000 2.8
                                  TiB 1.6 TiB 1.1 TiB 58.43 0.74 130    
                                                      osd.23
                                 24   hdd   2.73000  1.00000 2.8
                                  TiB 2.2 TiB 579 GiB 79.51 1.00 146    
                                                      osd.24
                                 25   hdd   2.73000  1.00000 2.8
                                  TiB 1.9 TiB 886 GiB 68.63 0.87 147    
                                                      osd.25
                                 31   hdd   5.45999  1.00000 5.5
                                  TiB 4.7 TiB 758 GiB 86.50 1.09 326    
                                                      osd.31
                                  6   ssd   0.74500  0.89999 744
                                  GiB 640 GiB 104 GiB 86.01 1.09  61    
                                                      osd.6

                                -17        28.04494        -  28
                                  TiB  22 TiB 6.3 TiB 77.61 0.98   -    
                                                  host arh-ibstorage4-ib
                                  8   hdd   2.73000  1.00000 2.8
                                  TiB 1.9 TiB 909 GiB 67.80 0.86 141    
                                                      osd.8
                                 17   hdd   2.73000  1.00000 2.8
                                  TiB 1.9 TiB 904 GiB 67.99 0.86 144    
                                                      osd.17
                                 27   hdd   2.73000  1.00000 2.8
                                  TiB 2.1 TiB 654 GiB 76.84 0.97 152    
                                                      osd.27
                                 28   hdd   2.73000  1.00000 2.8
                                  TiB 2.3 TiB 481 GiB 82.98 1.05 153    
                                                      osd.28

                                   29   hdd   2.73000  1.00000 2.8
                                    TiB 1.9 TiB 829 GiB 70.65 0.89 137  
                                                          osd.29
                                   30   hdd   2.73000  1.00000 2.8
                                    TiB 2.0 TiB 762 GiB 73.03 0.92 142  
                                                          osd.30
                                   33   hdd   2.73000  1.00000 2.8
                                    TiB 2.3 TiB 501 GiB 82.25 1.04 166  
                                                          osd.33
                                   34   hdd   5.45998  1.00000 5.5
                                    TiB 4.5 TiB 968 GiB 82.77 1.04 325  
                                                          osd.34
                                   39   hdd   2.73000  0.95000 2.8
                                    TiB 2.4 TiB 402 GiB 85.77 1.08 162  
                                                          osd.39
                                   38   ssd   0.74500  1.00000 745
                                    GiB 671 GiB  74 GiB 90.02 1.14  68  
                                                          osd.38
                                                         TOTAL 113
                                    TiB  90 TiB  23 TiB 79.25
                                  MIN/MAX VAR: 0.74/1.14  STDDEV:
                                    8.14

                        # for i in
                            $(radosgw-admin bucket list | jq -r '.[]');
                            do radosgw-admin bucket stats --bucket=$i |
                            jq '.usage | ."rgw.main" | .size_kb' ; done
                            | awk '{ SUM += $1} END { print
                            SUM/1024/1024/1024 }'
                        6.59098

                        # ceph df

                            GLOBAL:
                                SIZE        AVAIL      RAW USED    
                              %RAW USED
                                113 TiB     23 TiB       90 TiB    
                                  79.25

                          POOLS:
                              NAME                           ID    
                            USED        %USED     MAX AVAIL     OBJECTS
                              Primary-ubuntu-1               5    
                              27 TiB     87.56       3.9 TiB     7302534
                              .users.uid                     15    
                            6.8 KiB         0       3.9 TiB          39
                              .users                         16    
                              335 B         0       3.9 TiB          20
                              .users.swift                   17    
                               14 B         0       3.9 TiB           1
                              .rgw.buckets        
                                          19      15 TiB     79.88      
                                3.9 TiB     8787763
                              .users.email                   22    
                                0 B         0       3.9 TiB           0
                              .log                           24    
                            109 MiB         0       3.9 TiB      102301
                              .rgw.buckets.extra             37    
                                0 B         0       2.6 TiB           0
                              .rgw.root                      44    
                            2.9 KiB         0       2.6 TiB          16
                              .rgw.meta                      45    
                            1.7 MiB         0       2.6 TiB        6249
                              .rgw.control                   46    
                                0 B         0       2.6 TiB           8
                              .rgw.gc                        47    
                                0 B         0       2.6 TiB          32
                              .usage                         52    
                                0 B         0       2.6 TiB           0
                              .intent-log                    53    
                                0 B         0       2.6 TiB           0
                              default.rgw.buckets.non-ec     54    
                                0 B         0       2.6 TiB           0
                              .rgw.buckets.index             55    
                                0 B         0       2.6 TiB       11485
                              .rgw                           56    
                            491 KiB         0       2.6 TiB        1686
                              Primary-ubuntu-1-ssd           57    
                            1.2 TiB     92.39       105 GiB      379516

                          I am not too sure if the issue relates to
                            the BlueStore overhead as I would probably
                            have seen the discrepancy in my
                            Primary-ubuntu-1 pool as well. However, the
                            data usage on Primary-ubuntu-1 pool seems to
                            be consistent with my expectations (precise
                            numbers to be verified soon). The issues
                            seems to be only with the .rgw-buckets pool
                            where the "ceph df " output shows 15TB of
                            usage and the sum of all buckets in that
                            pool shows just over 6.5TB.

                          Cheers

                          Andrei

                          From:
                            "Igor Fedotov" <ifedotov@xxxxxxx>

                            To: "andrei" <andrei@xxxxxxxxxx>,
                            "ceph-users" <ceph-users@xxxxxxxxxxxxxx>

                            Sent: Tuesday, 2 July, 2019 10:58:54

                            Subject: Re:
                            troubleshooting space usage

                            Hi Andrei,
                            The most obvious reason is space usage
                              overhead caused by BlueStore allocation
                              granularity, e.g. if
                              bluestore_min_alloc_size is 64K  and
                              average object size is 16K one will waste
                              48K per object in average. This is rather
                              a speculation so far as we lack key the
                              information about your cluster:
                            - Ceph version
                            - What are the main devices for OSD: hdd
                              or ssd.
                            - BlueStore or FileStore.
                            - average RGW object size.

                            You might also want to collect and share
                              performance counter dumps (ceph daemon
                              osd.N perf dump) and "

                            " reports from a couple of your OSDs.

                            Thanks,
                            Igor

                            On 7/2/2019
                              11:43 AM, Andrei Mikhailovsky wrote:

                                Bump!

                                  From:
                                    "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>

                                    To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>

                                    Sent: Friday, 28 June, 2019
                                    14:54:53

                                    Subject: 
                                    troubleshooting space usage

                                      Hi

                                      Could someone please explain
                                        / show how to troubleshoot the
                                        space usage in Ceph and how to
                                        reclaim the unused space?

                                      I have a small cluster with
                                        40 osds, replica of 2, mainly
                                        used as a backend for cloud
                                        stack as well as the S3 gateway.
                                        The used space doesn't make any
                                        sense to me, especially the rgw
                                        pool, so I am seeking help.

                                      Here is what I found from the
                                        client: 

                                      Ceph -s shows the

                                       usage:   89 TiB used, 24 TiB
                                        / 113 TiB avail

                                      Ceph df shows:

                                      Primary-ubuntu-1            
                                          5       27 TiB     90.11      
                                        3.0 TiB     7201098
                                      Primary-ubuntu-1-ssd        
                                          57     1.2 TiB     89.62      
                                        143 GiB      359260   
                                      .rgw.buckets          
                                                  19      15 TiB    
                                          83.73       3.0 TiB    
                                          8742222

                                      the usage of the
                                          Primary-ubuntu-1 and
                                          Primary-ubuntu-1-ssd is in
                                          line with my expectations.
                                        However, the .rgw.buckets pool
                                        seems to be using way too much.
                                        The usage of all rgw buckets
                                        shows 6.5TB usage (looking at
                                        the size_kb values from the
                                        "radosgw-admin bucket stats"). I
                                        am trying to figure out why
                                        .rgw.buckets is using 15TB of
                                        space instead of the 6.5TB as
                                        shown from the bucket usage. 

                                      Thanks

                                      Andrei

_______________________________________________

                                    ceph-users mailing list

                                    ceph-users@xxxxxxxxxxxxxx

                                    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                              _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com