Re: troubleshooting space usage

Andrei Mikhailovsky <andrei@xxxxxxxxxx> · Wed, 3 Jul 2019 16:34:59 +0100 (BST)

Hi Igor. 

The numbers are identical it seems:

    .rgw.buckets                   19      15 TiB     78.22       4.3 TiB     8786934

# cat /root/ceph-rgw.buckets-rados-ls-all |wc -l
8786934

Cheers
From: "Igor Fedotov" <ifedotov@xxxxxxx>
To: "andrei" <andrei@xxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Wednesday, 3 July, 2019 13:49:02
Subject: Re:  troubleshooting space usage
Looks fine - comparing bluestore_allocated vs. bluestore_stored
      shows a little difference. So that's not the allocation overhead.
    What's about comparing object counts reported by ceph and radosgw
      tools?

    Igor.

    On 7/3/2019 3:25 PM, Andrei
      Mikhailovsky wrote:

        Thanks Igor, Here is a link to the ceph perf data on
          several osds.

        https://paste.ee/p/IzDMy

        In terms of the object sizes. We use rgw to backup the data
          from various workstations and servers. So, the sizes would be
          from a few kb to a few gig per individual file.

        Cheers

          From:
            "Igor Fedotov" <ifedotov@xxxxxxx>

            To: "andrei" <andrei@xxxxxxxxxx>

            Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>

            Sent: Wednesday, 3 July, 2019 12:29:33

            Subject: Re:  troubleshooting space
            usage

            Hi Andrei,
            Additionally I'd like to see performance counters dump
              for a couple of HDD OSDs (obtained through 'ceph daemon
              osd.N perf dump' command).
            W.r.t average object size - I was thinking that you might
              know what objects had been uploaded... If not then you
              might want to estimate it by using "rados get" command on
              the pool: retrieve some random object set and check their
              sizes. But let's check performance counters first - most
              probably they will show loses caused by allocation.

            Also I've just found similar issue (still unresolved) in
              our internal tracker - but its root cause is definitely
              different from allocation overhead. Looks like some
              orphaned objects in the pool. Could you please compare and
              share the amounts of objects in the pool reported by "ceph
              (or rados) df detail" and radosgw tools?

            Thanks,
            Igor

            On 7/3/2019 12:56 PM, Andrei
              Mikhailovsky wrote:

                Hi Igor,

                Many thanks for your reply. Here are the details
                  about the cluster:

                1. Ceph version - 13.2.5-1xenial (installed from
                  Ceph repository for ubuntu 16.04)

                2. main devices for radosgw pool - hdd. we do use a
                  few ssds for the other pool, but it is not used by
                  radosgw

                3. we use BlueStore

                4. Average rgw object size - I have no idea how to
                  check that. Couldn't find a simple answer from google
                  either. Could you please let me know how to check
                  that?

                5. Ceph osd df tree:

                6. Other useful info on the cluster:

                  # ceph osd df tree
                  ID  CLASS WEIGHT    REWEIGHT SIZE    USE    
                    AVAIL   %USE  VAR  PGS TYPE NAME

                   -1       112.17979        - 113 TiB  90 TiB  23
                    TiB 79.25 1.00   - root uk
                   -5       112.17979        - 113 TiB  90 TiB  23
                    TiB 79.25 1.00   -     datacenter ldex
                  -11       112.17979        - 113 TiB  90 TiB  23
                    TiB 79.25 1.00   -         room ldex-dc3
                  -13       112.17979        - 113 TiB  90 TiB  23
                    TiB 79.25 1.00   -             row row-a
                   -4       112.17979        - 113 TiB  90 TiB  23
                    TiB 79.25 1.00   -                 rack ldex-rack-a5
                   -2        28.04495        -  28 TiB  22 TiB 6.2
                    TiB 77.96 0.98   -                     host
                    arh-ibstorage1-ib

                    0   hdd   2.73000  0.79999 2.8 TiB 2.3 TiB 519
                    GiB 81.61 1.03 145                         osd.0
                    1   hdd   2.73000  1.00000 2.8 TiB 1.9 TiB 847
                    GiB 70.00 0.88 130                         osd.1

                     2   hdd   2.73000  1.00000 2.8 TiB 2.2 TiB 561
                      GiB 80.12 1.01 152                         osd.2
                      3   hdd   2.73000  1.00000 2.8 TiB 2.3 TiB
                      469 GiB 83.41 1.05 160                        
                      osd.3
                      4   hdd   2.73000  1.00000 2.8 TiB 1.8 TiB
                      983 GiB 65.18 0.82 141                        
                      osd.4
                     32   hdd   5.45999  1.00000 5.5 TiB 4.4 TiB
                      1.1 TiB 80.68 1.02 306                        
                      osd.32
                     35   hdd   2.73000  1.00000 2.8 TiB 1.7 TiB
                      1.0 TiB 62.89 0.79 126                        
                      osd.35
                     36   hdd   2.73000  1.00000 2.8 TiB 2.3 TiB
                      464 GiB 83.58 1.05 175                        
                      osd.36
                     37   hdd   2.73000  0.89999 2.8 TiB 2.5 TiB
                      301 GiB 89.34 1.13 160                        
                      osd.37
                      5   ssd   0.74500  1.00000 745 GiB 642 GiB
                      103 GiB 86.15 1.09  65                        
                      osd.5

                     -3        28.04495        -  28 TiB  24 TiB
                      4.5 TiB 84.03 1.06   -                     host
                      arh-ibstorage2-ib
                      9   hdd   2.73000  0.95000 2.8 TiB 2.4 TiB
                      405 GiB 85.65 1.08 158                        
                      osd.9
                     10   hdd   2.73000  0.89999 2.8 TiB 2.4 TiB
                      352 GiB 87.52 1.10 169                        
                      osd.10

                       11   hdd   2.73000  1.00000 2.8 TiB 2.0 TiB
                        783 GiB 72.28 0.91 160                        
                        osd.11
                       12   hdd   2.73000  0.84999 2.8 TiB 2.4 TiB
                        359 GiB 87.27 1.10 153                        
                        osd.12
                       13   hdd   2.73000  1.00000 2.8 TiB 2.4 TiB
                        348 GiB 87.69 1.11 169                        
                        osd.13
                       14   hdd   2.73000  1.00000 2.8 TiB 2.5 TiB
                        283 GiB 89.97 1.14 170                        
                        osd.14
                       15   hdd   2.73000  1.00000 2.8 TiB 2.2 TiB
                        560 GiB 80.18 1.01 155                        
                        osd.15
                       16   hdd   2.73000  0.95000 2.8 TiB 2.4 TiB
                        332 GiB 88.26 1.11 178                        
                        osd.16
                       26   hdd   5.45999  1.00000 5.5 TiB 4.4 TiB
                        1.0 TiB 81.04 1.02 324                        
                        osd.26
                        7   ssd   0.74500  1.00000 745 GiB 607 GiB
                        138 GiB 81.48 1.03  62                        
                        osd.7

                      -15        28.04495        -  28 TiB  22 TiB
                        6.4 TiB 77.40 0.98   -                     host
                        arh-ibstorage3-ib
                       18   hdd   2.73000  0.95000 2.8 TiB 2.5 TiB
                        312 GiB 88.96 1.12 156                        
                        osd.18
                       19   hdd   2.73000  1.00000 2.8 TiB 2.0 TiB
                        771 GiB 72.68 0.92 162                        
                        osd.19
                       20   hdd   2.73000  1.00000 2.8 TiB 2.0 TiB
                        733 GiB 74.04 0.93 149                        
                        osd.20

                         21   hdd   2.73000  1.00000 2.8 TiB 2.2
                          TiB 533 GiB 81.12 1.02 155                    
                              osd.21
                         22   hdd   2.73000  1.00000 2.8 TiB 2.1
                          TiB 692 GiB 75.48 0.95 144                    
                              osd.22
                         23   hdd   2.73000  1.00000 2.8 TiB 1.6
                          TiB 1.1 TiB 58.43 0.74 130                    
                              osd.23
                         24   hdd   2.73000  1.00000 2.8 TiB 2.2
                          TiB 579 GiB 79.51 1.00 146                    
                              osd.24
                         25   hdd   2.73000  1.00000 2.8 TiB 1.9
                          TiB 886 GiB 68.63 0.87 147                    
                              osd.25
                         31   hdd   5.45999  1.00000 5.5 TiB 4.7
                          TiB 758 GiB 86.50 1.09 326                    
                              osd.31
                          6   ssd   0.74500  0.89999 744 GiB 640
                          GiB 104 GiB 86.01 1.09  61                    
                              osd.6

                        -17        28.04494        -  28 TiB  22
                          TiB 6.3 TiB 77.61 0.98   -                    
                          host arh-ibstorage4-ib
                          8   hdd   2.73000  1.00000 2.8 TiB 1.9
                          TiB 909 GiB 67.80 0.86 141                    
                              osd.8
                         17   hdd   2.73000  1.00000 2.8 TiB 1.9
                          TiB 904 GiB 67.99 0.86 144                    
                              osd.17
                         27   hdd   2.73000  1.00000 2.8 TiB 2.1
                          TiB 654 GiB 76.84 0.97 152                    
                              osd.27
                         28   hdd   2.73000  1.00000 2.8 TiB 2.3
                          TiB 481 GiB 82.98 1.05 153                    
                              osd.28

                           29   hdd   2.73000  1.00000 2.8 TiB 1.9
                            TiB 829 GiB 70.65 0.89 137                  
                                  osd.29
                           30   hdd   2.73000  1.00000 2.8 TiB 2.0
                            TiB 762 GiB 73.03 0.92 142                  
                                  osd.30
                           33   hdd   2.73000  1.00000 2.8 TiB 2.3
                            TiB 501 GiB 82.25 1.04 166                  
                                  osd.33
                           34   hdd   5.45998  1.00000 5.5 TiB 4.5
                            TiB 968 GiB 82.77 1.04 325                  
                                  osd.34
                           39   hdd   2.73000  0.95000 2.8 TiB 2.4
                            TiB 402 GiB 85.77 1.08 162                  
                                  osd.39
                           38   ssd   0.74500  1.00000 745 GiB 671
                            GiB  74 GiB 90.02 1.14  68                  
                                  osd.38
                                                 TOTAL 113 TiB  90
                            TiB  23 TiB 79.25
                          MIN/MAX VAR: 0.74/1.14  STDDEV: 8.14

                # for i in
                    $(radosgw-admin bucket list | jq -r '.[]'); do
                    radosgw-admin bucket stats --bucket=$i | jq '.usage
                    | ."rgw.main" | .size_kb' ; done | awk '{ SUM += $1}
                    END { print SUM/1024/1024/1024 }'
                6.59098

                # ceph df

                    GLOBAL:
                        SIZE        AVAIL      RAW USED     %RAW
                      USED
                        113 TiB     23 TiB       90 TiB        
                      79.25

                  POOLS:
                      NAME                           ID     USED  
                         %USED     MAX AVAIL     OBJECTS
                      Primary-ubuntu-1               5       27 TiB
                        87.56       3.9 TiB     7302534
                      .users.uid                     15     6.8 KiB
                            0       3.9 TiB          39
                      .users                         16       335 B
                            0       3.9 TiB          20
                      .users.swift                   17        14 B
                            0       3.9 TiB           1

                          .rgw.buckets                   19      15 TiB
                            79.88       3.9 TiB     8787763
                      .users.email                   22         0 B
                            0       3.9 TiB           0
                      .log                           24     109 MiB
                            0       3.9 TiB      102301
                      .rgw.buckets.extra             37         0 B
                            0       2.6 TiB           0
                      .rgw.root                      44     2.9 KiB
                            0       2.6 TiB          16
                      .rgw.meta                      45     1.7 MiB
                            0       2.6 TiB        6249
                      .rgw.control                   46         0 B
                            0       2.6 TiB           8
                      .rgw.gc                        47         0 B
                            0       2.6 TiB          32
                      .usage                         52         0 B
                            0       2.6 TiB           0
                      .intent-log                    53         0 B
                            0       2.6 TiB           0
                      default.rgw.buckets.non-ec     54         0 B
                            0       2.6 TiB           0
                      .rgw.buckets.index             55         0 B
                            0       2.6 TiB       11485
                      .rgw                           56     491 KiB
                            0       2.6 TiB        1686
                      Primary-ubuntu-1-ssd           57     1.2 TiB
                        92.39       105 GiB      379516

                  I am not too sure if the issue relates to the
                    BlueStore overhead as I would probably have seen the
                    discrepancy in my Primary-ubuntu-1 pool as well.
                    However, the data usage on Primary-ubuntu-1 pool
                    seems to be consistent with my expectations (precise
                    numbers to be verified soon). The issues seems to be
                    only with the .rgw-buckets pool where the "ceph df "
                    output shows 15TB of usage and the sum of all
                    buckets in that pool shows just over 6.5TB.

                  Cheers

                  Andrei

                  From:
                    "Igor Fedotov" <ifedotov@xxxxxxx>

                    To: "andrei" <andrei@xxxxxxxxxx>,
                    "ceph-users" <ceph-users@xxxxxxxxxxxxxx>

                    Sent: Tuesday, 2 July, 2019 10:58:54

                    Subject: Re:  troubleshooting
                    space usage

                    Hi Andrei,
                    The most obvious reason is space usage overhead
                      caused by BlueStore allocation granularity, e.g.
                      if bluestore_min_alloc_size is 64K  and average
                      object size is 16K one will waste 48K per object
                      in average. This is rather a speculation so far as
                      we lack key the information about your cluster:
                    - Ceph version
                    - What are the main devices for OSD: hdd or ssd.
                    - BlueStore or FileStore.
                    - average RGW object size.

                    You might also want to collect and share
                      performance counter dumps (ceph daemon osd.N perf
                      dump) and "

                    " reports from a couple of your OSDs.

                    Thanks,
                    Igor

                    On 7/2/2019 11:43 AM,
                      Andrei Mikhailovsky wrote:

                        Bump!

                          From:
                            "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>

                            To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>

                            Sent: Friday, 28 June, 2019 14:54:53

                            Subject: 
                            troubleshooting space usage

                              Hi

                              Could someone please explain / show
                                how to troubleshoot the space usage in
                                Ceph and how to reclaim the unused
                                space?

                              I have a small cluster with 40 osds,
                                replica of 2, mainly used as a backend
                                for cloud stack as well as the S3
                                gateway. The used space doesn't make any
                                sense to me, especially the rgw pool, so
                                I am seeking help.

                              Here is what I found from the client:

                              Ceph -s shows the

                               usage:   89 TiB used, 24 TiB / 113
                                TiB avail

                              Ceph df shows:

                              Primary-ubuntu-1               5    
                                  27 TiB     90.11       3.0 TiB    
                                7201098
                              Primary-ubuntu-1-ssd           57    
                                1.2 TiB     89.62       143 GiB    
                                 359260   
                              .rgw.buckets
                                                    19      15 TiB    
                                  83.73       3.0 TiB     8742222

                              the
                                  usage of the Primary-ubuntu-1 and
                                  Primary-ubuntu-1-ssd is in line with
                                  my expectations. However, the
                                .rgw.buckets pool seems to be using way
                                too much. The usage of all rgw buckets
                                shows 6.5TB usage (looking at the
                                size_kb values from the "radosgw-admin
                                bucket stats"). I am trying to figure
                                out why .rgw.buckets is using 15TB of
                                space instead of the 6.5TB as shown from
                                the bucket usage. 

                              Thanks

                              Andrei

_______________________________________________

                            ceph-users mailing list

                            ceph-users@xxxxxxxxxxxxxx

                            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com