Re: s3 requires twice the space it should use

Boris Behrens <bb@xxxxxxxxx> · Fri, 23 Apr 2021 09:20:51 +0200

So I am following the orphans trail.

Now I have a job that is running since 3 1/2 days. Can I hit finish on a
job that is in the comparing state? It is in this since 2 days and the
messages in the output are repeating and look like this:

leaked:
ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.578__shadow_95a2980b-7012-43dd-81f2-07577cfcb9f0/25bc235b-3bf9-4db2-ace0-d149653bfd8b/e79909ed-e52e-4d16-a3a9-e84e332d37fa.lz4.2~YVwbe-JPoLioLOSEQtTwYOPt_wmUCHn.4310_2

This is the find job. I created it with: radosgw-admin orphans find
--job-id bb-orphan-2021-04-19 --bucket=BUCKET --yes-i-really-mean-it --pool
eu-central-1.rgw.buckets.data
    {
        "orphan_search_state": {
            "info": {
                "orphan_search_info": {
                    "job_name": "bb-orphan-2021-04-19",
                    "pool": "eu-central-1.rgw.buckets.data",
                    "num_shards": 64,
                    "start_time": "2021-04-19 16:42:45.993615Z"
                }
            },
            "stage": {
                "orphan_search_stage": {
                    "search_stage": "comparing",
                    "shard": 0,
                    "marker": ""
                }
            }
        }
    },

Am Fr., 16. Apr. 2021 um 10:57 Uhr schrieb Boris Behrens <bb@xxxxxxxxx>:

> Could this also be failed multipart uploads?
>
> Am Do., 15. Apr. 2021 um 18:23 Uhr schrieb Boris Behrens <bb@xxxxxxxxx>:
>
>> Cheers,
>>
>> [root@s3db1 ~]#  ceph daemon osd.23 perf dump | grep numpg
>>         "numpg": 187,
>>         "numpg_primary": 64,
>>         "numpg_replica": 121,
>>         "numpg_stray": 2,
>>         "numpg_removing": 0,
>>
>>
>> Am Do., 15. Apr. 2021 um 18:18 Uhr schrieb 胡 玮文 <huww98@xxxxxxxxxxx>:
>>
>>> Hi Boris,
>>>
>>> Could you check something like
>>>
>>> ceph daemon osd.23 perf dump | grep numpg
>>>
>>> to see if there are some stray or removing PG?
>>>
>>> Weiwen Hu
>>>
>>> > 在 2021年4月15日，22:53，Boris Behrens <bb@xxxxxxxxx> 写道：
>>> >
>>> > Ah you are right.
>>> > [root@s3db1 ~]# ceph daemon osd.23 config get
>>> bluestore_min_alloc_size_hdd
>>> > {
>>> >    "bluestore_min_alloc_size_hdd": "65536"
>>> > }
>>> > But I also checked how many objects our s3 hold and the numbers just
>>> do not
>>> > add up.
>>> > There are only 26509200 objects, which would result in around 1TB
>>> "waste"
>>> > if every object would be empty.
>>> >
>>> > I think the problem began when I updated the PG count from 1024 to
>>> 2048.
>>> > Could there be an issue where the data is written twice?
>>> >
>>> >
>>> >> Am Do., 15. Apr. 2021 um 16:48 Uhr schrieb Amit Ghadge <
>>> amitg.b14@xxxxxxxxx
>>> >>> :
>>> >>
>>> >> verify those two parameter values ,bluestore_min_alloc_size_hdd &
>>> >> bluestore_min_alloc_size_sdd, If you are using hdd disk then
>>> >> bluestore_min_alloc_size_hdd are applicable.
>>> >>
>>> >>> On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens <bb@xxxxxxxxx> wrote:
>>> >>>
>>> >>> So, I need to live with it? A value of zero leads to use the default?
>>> >>> [root@s3db1 ~]# ceph daemon osd.23 config get
>>> bluestore_min_alloc_size
>>> >>> {
>>> >>>    "bluestore_min_alloc_size": "0"
>>> >>> }
>>> >>>
>>> >>> I also checked the fragmentation on the bluestore OSDs and it is
>>> around
>>> >>> 0.80 - 0.89 on most OSDs. yikes.
>>> >>> [root@s3db1 ~]# ceph daemon osd.23 bluestore allocator score block
>>> >>> {
>>> >>>    "fragmentation_rating": 0.85906054329923576
>>> >>> }
>>> >>>
>>> >>> The problem I currently have is, that I barely keep up with adding
>>> OSD
>>> >>> disks.
>>> >>>
>>> >>> Am Do., 15. Apr. 2021 um 16:18 Uhr schrieb Amit Ghadge <
>>> >>> amitg.b14@xxxxxxxxx>:
>>> >>>
>>> >>>> size_kb_actual are actually bucket object size but on OSD level the
>>> >>>> bluestore_min_alloc_size default 64KB and SSD are 16KB
>>> >>>>
>>> >>>>
>>> >>>>
>>> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faccess.redhat.com%2Fdocumentation%2Fen-us%2Fred_hat_ceph_storage%2F3%2Fhtml%2Fadministration_guide%2Fosd-bluestore&amp;data=04%7C01%7C%7Cba98c0dff13941ea96ff08d9001e3759%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637540952043049058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wfSqqyiDHRXp4ypOGTxx4p%2Buy902OGPEmGkNfJ2BF6I%3D&amp;reserved=0
>>> >>>>
>>> >>>> -AmitG
>>> >>>>
>>> >>>> On Thu, Apr 15, 2021 at 7:29 PM Boris Behrens <bb@xxxxxxxxx> wrote:
>>> >>>>
>>> >>>>> Hi,
>>> >>>>>
>>> >>>>> maybe it is just a problem in my understanding, but it looks like
>>> our s3
>>> >>>>> requires twice the space it should use.
>>> >>>>>
>>> >>>>> I ran "radosgw-admin bucket stats", and added all "size_kb_actual"
>>> >>>>> values
>>> >>>>> up and divided to TB (/1024/1024/1024).
>>> >>>>> The resulting space is 135,1636733 TB. When I tripple it because of
>>> >>>>> replication I end up with around 405TB which is nearly half the
>>> space of
>>> >>>>> what ceph df tells me.
>>> >>>>>
>>> >>>>> Hope someone can help me.
>>> >>>>>
>>> >>>>> ceph df shows
>>> >>>>> RAW STORAGE:
>>> >>>>>    CLASS     SIZE         AVAIL       USED        RAW USED     %RAW
>>> >>>>> USED
>>> >>>>>    hdd       1009 TiB     189 TiB     820 TiB      820 TiB
>>> >>>>> 81.26
>>> >>>>>    TOTAL     1009 TiB     189 TiB     820 TiB      820 TiB
>>> >>>>> 81.26
>>> >>>>>
>>> >>>>> POOLS:
>>> >>>>>    POOL                                ID     PGS      STORED
>>> >>>>> OBJECTS
>>> >>>>>    USED        %USED     MAX AVAIL
>>> >>>>>    rbd                                  0       64         0 B
>>> >>>>>   0
>>> >>>>>        0 B         0        18 TiB
>>> >>>>>    .rgw.root                            1       64      99 KiB
>>> >>>>> 119
>>> >>>>>     99 KiB         0        18 TiB
>>> >>>>>    eu-central-1.rgw.control             2       64         0 B
>>> >>>>>   8
>>> >>>>>        0 B         0        18 TiB
>>> >>>>>    eu-central-1.rgw.data.root           3       64     1.0 MiB
>>> >>>>> 3.15k
>>> >>>>>    1.0 MiB         0        18 TiB
>>> >>>>>    eu-central-1.rgw.gc                  4       64      71 MiB
>>> >>>>>  32
>>> >>>>>     71 MiB         0        18 TiB
>>> >>>>>    eu-central-1.rgw.log                 5       64     267 MiB
>>> >>>>> 564
>>> >>>>>    267 MiB         0        18 TiB
>>> >>>>>    eu-central-1.rgw.users.uid           6       64     2.8 MiB
>>> >>>>> 6.91k
>>> >>>>>    2.8 MiB         0        18 TiB
>>> >>>>>    eu-central-1.rgw.users.keys          7       64     263 KiB
>>> >>>>> 6.73k
>>> >>>>>    263 KiB         0        18 TiB
>>> >>>>>    eu-central-1.rgw.meta                8       64     384 KiB
>>> >>>>>  1k
>>> >>>>>    384 KiB         0        18 TiB
>>> >>>>>    eu-central-1.rgw.users.email         9       64        40 B
>>> >>>>>   1
>>> >>>>>       40 B         0        18 TiB
>>> >>>>>    eu-central-1.rgw.buckets.index      10       64      10 GiB
>>> >>>>> 67.61k
>>> >>>>>     10 GiB      0.02        18 TiB
>>> >>>>>    eu-central-1.rgw.buckets.data       11     2048     264 TiB
>>> >>>>> 138.31M
>>> >>>>>    264 TiB     83.37        18 TiB
>>> >>>>>    eu-central-1.rgw.buckets.non-ec     12       64     297 MiB
>>> >>>>> 11.32k
>>> >>>>>    297 MiB         0        18 TiB
>>> >>>>>    eu-central-1.rgw.usage              13       64     536 MiB
>>> >>>>>  32
>>> >>>>>    536 MiB         0        18 TiB
>>> >>>>>    eu-msg-1.rgw.control                56       64         0 B
>>> >>>>>   8
>>> >>>>>        0 B         0        18 TiB
>>> >>>>>    eu-msg-1.rgw.data.root              57       64      72 KiB
>>> >>>>> 227
>>> >>>>>     72 KiB         0        18 TiB
>>> >>>>>    eu-msg-1.rgw.gc                     58       64     300 KiB
>>> >>>>>  32
>>> >>>>>    300 KiB         0        18 TiB
>>> >>>>>    eu-msg-1.rgw.log                    59       64     835 KiB
>>> >>>>> 242
>>> >>>>>    835 KiB         0        18 TiB
>>> >>>>>    eu-msg-1.rgw.users.uid              60       64      56 KiB
>>> >>>>> 104
>>> >>>>>     56 KiB         0        18 TiB
>>> >>>>>    eu-msg-1.rgw.usage                  61       64      37 MiB
>>> >>>>>  25
>>> >>>>>     37 MiB         0        18 TiB
>>> >>>>>    eu-msg-1.rgw.users.keys             62       64     3.8 KiB
>>> >>>>>  97
>>> >>>>>    3.8 KiB         0        18 TiB
>>> >>>>>    eu-msg-1.rgw.meta                   63       64     607 KiB
>>> >>>>> 1.60k
>>> >>>>>    607 KiB         0        18 TiB
>>> >>>>>    eu-msg-1.rgw.buckets.index          64       64      71 MiB
>>> >>>>> 119
>>> >>>>>     71 MiB         0        18 TiB
>>> >>>>>    eu-msg-1.rgw.users.email            65       64         0 B
>>> >>>>>   0
>>> >>>>>        0 B         0        18 TiB
>>> >>>>>    eu-msg-1.rgw.buckets.data           66       64     2.9 TiB
>>> >>>>> 1.16M
>>> >>>>>    2.9 TiB      5.30        18 TiB
>>> >>>>>    eu-msg-1.rgw.buckets.non-ec         67       64     2.2 MiB
>>> >>>>> 354
>>> >>>>>    2.2 MiB         0        18 TiB
>>> >>>>>    default.rgw.control                 69       32         0 B
>>> >>>>>   8
>>> >>>>>        0 B         0        18 TiB
>>> >>>>>    default.rgw.data.root               70       32         0 B
>>> >>>>>   0
>>> >>>>>        0 B         0        18 TiB
>>> >>>>>    default.rgw.gc                      71       32         0 B
>>> >>>>>   0
>>> >>>>>        0 B         0        18 TiB
>>> >>>>>    default.rgw.log                     72       32         0 B
>>> >>>>>   0
>>> >>>>>        0 B         0        18 TiB
>>> >>>>>    default.rgw.users.uid               73       32         0 B
>>> >>>>>   0
>>> >>>>>        0 B         0        18 TiB
>>> >>>>>    fra-1.rgw.control                   74       32         0 B
>>> >>>>>   8
>>> >>>>>        0 B         0        18 TiB
>>> >>>>>    fra-1.rgw.meta                      75       32         0 B
>>> >>>>>   0
>>> >>>>>        0 B         0        18 TiB
>>> >>>>>    fra-1.rgw.log                       76       32        50 B
>>> >>>>>  28
>>> >>>>>       50 B         0        18 TiB
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal
>>> abweichend im
>>> >>>>> groÃƒ¼en Saal.
>>> >>>>> _______________________________________________
>>> >>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> >>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>> --
>>> >>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal
>>> abweichend im
>>> >>> groÃƒ¼en Saal.
>>> >>>
>>> >>
>>> >
>>> > --
>>> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
>>> im
>>> > groÃƒ¼en Saal.
>>> > _______________________________________________
>>> > ceph-users mailing list -- ceph-users@xxxxxxx
>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
>> groÃƒ¼en Saal.
>>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groÃƒ¼en Saal.
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx