Re: large bucket index in multisite environement (how to deal with large omap objects warning)?

Boris Behrens <bb@xxxxxxxxx> · Wed, 10 Nov 2021 10:52:50 +0100

I am just creating a bucket with a lot of files to test it. Who would have
thought that uploading a million 1k files would take days?

Am Di., 9. Nov. 2021 um 00:50 Uhr schrieb prosergey07 <prosergey07@xxxxxxxxx
>:

> When resharding is performed I believe its considered as bucket operation
> and undergoes through updating the bucket stats. Like new bucket shard is
> created and it may increase the number of objects within the bucket stats.
>  If it was broken during resharding, you could check the current bucket id
> from:
>  radosgw-admin metadata get "bucket:BUCKET_NAME".
>
> That would hive an idea which bucket index objects to keep.
>
>  Then you could remove corrupted bucket shards (not the ones with the
> bucket id from the previous command) .dir.corrupted_bucket_index.SHARD_NUM
> objects from bucket.index pool:
>
> rados -p bucket.index .dir.corrupted_bucket_index.SHARD_NUM
>
> Where SHARD_NUM is the shard number you want to delete.
>
>  And then running "radosgw-admin bucket check --fix --bucket=BUCKET_NAME"
>
>  That should have resolved your issue with the number of objects.
>
>  As for slow object deletion. Do you run your metadata pools for rgw on
> nvme drives ? Specifically bucket.index pool. The problem is that you have
> a lot of objects and probably not enough shards. Radosgw retrieves the list
> of objects from bucket.index and if I remember correct it retrieves them as
> ordered list which is very expensive operation. Hence handful of time might
> be spent just on getting the object list.
>
>  We get 1000 objects per second deleted  inside our storage.
>
>
> I would not recommend using "--inconsistent-index" to avoid more
> consitency issues.
>
>
>
>
> Надіслано з пристрою Galaxy
>
>
> -------- Оригінальне повідомлення --------
> Від: mhnx <morphinwithyou@xxxxxxxxx>
> Дата: 08.11.21 13:28 (GMT+02:00)
> Кому: Сергей Процун <prosergey07@xxxxxxxxx>
> Копія: "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>, Boris Behrens <
> bb@xxxxxxxxx>, Ceph Users <ceph-users@xxxxxxx>
> Тема: Re:  Re: large bucket index in multisite environement
> (how to deal with large omap objects warning)?
>
> (There should not be any issues using rgw for other buckets while
> re-sharding.)
> If it is then disabling the bucket access will work right? Also sync
> should be disabled.
>
> Yes, after the manual reshard it should clear the leftovers but in my
> situation resharding failed and I got double entries for that bucket.
> I didn't push further, instead I divide the bucket to new buckets and
> reduce object count with a new bucket tree. Copied all of the objects with
> rclone and started bucket remove "radosgw-admin bucket rm --bucket=mybucket
> --bypass-gc --purge-objects --max-concurrent-ios=128" it has been very very
> long time "started at Sep08" and it is still working. There was 250M
> objects in that bucket and after the manual reshard faiI I got 500M object
> count when I check with bucket stats num_objects. Now I have;
> "size_kb": 10648067645,
> "num_objects": 132270190
>
> Remove speed is 50-60 objects in a second. It's not because of the cluster
> speed. Cluster is fine.
> I have space so I let it go. When I see stable object count I will stop
> the remove process and start again with the
> " --inconsistent-index" parameter.
> I wonder is it safe to use the parameter with referenced objects? I want
> to learn how "--inconsistent-index" works and what it does.
>
> Сергей Процун <prosergey07@xxxxxxxxx>, 5 Kas 2021 Cum, 17:46 tarihinde
> şunu yazdı:
>
>> There should not be any issues using rgw for other buckets while
>> re-sharding.
>>
>> As for doubling number of objects after reshard is an interesting
>> situation. After the manual reshard is done, there might be leftover from
>> the old bucket index. As during reshard new .dir.new_bucket_index objects
>> are created. They contain all data related to the objects which are stored
>> in buckets.data pool. Just wondering if the issue with the doubled number
>> of objects was related to old bucket index. If so its save to delete old
>> bucket index.
>>
>>  In the perfect world, it would be ideal to know the eventoal number of
>> objects inside the bucket and set number of shards to the corresponding
>> setting initially.
>>
>>  In the real world when the client re-purpose the usage of the bucket, we
>> have to deal with reshards.
>>
>> пт, 5 лист. 2021, 14:43 користувач mhnx <morphinwithyou@xxxxxxxxx> пише:
>>
>>> I also use this method and I hate it.
>>>
>>> Stopping all of the RGW clients is never an option! It shouldn't be.
>>> Sharding is hell. I was have 250M objects in a bucket and reshard failed
>>> after 2days and object count doubled somehow! 2 days of downtime is not
>>> an
>>> option.
>>>
>>> I wonder if I stop the write-read on a bucket and while resharding it is
>>> there any problem of using RGW's with all other buckets?
>>>
>>> Nowadays I advise splitting buckets as much as you can! That means
>>> changing
>>> your apps directory tree but this design requires it.
>>> You need to plan object count at least for 5 years and create ones.
>>> Usually I use 101 shards which means 10.100.000 objects.
>>> Also If I need to use versioning I use 2x101 or 3x101 because versions
>>> are
>>> hard to predict. You need to predict how many versions you need and set a
>>> lifecycle even before using the bucket!
>>> The max shard that I use 1999. I'm not happy about it but sometimes you
>>> gotta do what you need to do.
>>> Fighting with customers is not an option, you can only advise changing
>>> their apps folder tree but I've never seen someone accept the deal
>>> without
>>> arguing.
>>>
>>> My offers usually like this:
>>> 1- Core files bucket: no need to change or very limited changes.
>>> "calculate
>>> the object count and multiply with 2"
>>> 2- Hot data bucket: There will be daily changes and versioning.
>>> "calculate
>>> the object count and multiply with 3"
>>> 3- Cold data bucket[s]: There will be no daily changes. You should open
>>> new
>>> buckets every Year or Month. This is good to keep it clean and steady. No
>>> need for versioning and Multisite Will not suffer due to barely changes.
>>> 4- Temp files bucket[s]: This is so important. If you're crawling
>>> millions
>>> of millions objects everyday and delete it at the end of the week or
>>> month
>>> then you should definitely use a temp bucket.  No versioning, No
>>> multisite,
>>> No index if it's possible.
>>>
>>>
>>>
>>> Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>, 5 Kas 2021 Cum, 12:30
>>> tarihinde şunu yazdı:
>>>
>>> > You mean prepare or reshard?
>>> > Prepare:
>>> > I collect as much information for the users before onboarding so I can
>>> > prepare for their use case in the future and set things up.
>>> >
>>> > Preshard:
>>> > After created the bucket:
>>> > radosgw-admin bucket reshard --bucket=ex-bucket --num-shards=101
>>> >
>>> > Also when you shard the buckets, you need to use prime numbers.
>>> >
>>> > Istvan Szabo
>>> > Senior Infrastructure Engineer
>>> > ---------------------------------------------------
>>> > Agoda Services Co., Ltd.
>>> > e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
>>> > ---------------------------------------------------
>>> >
>>> > From: Boris Behrens <bb@xxxxxxxxx>
>>> > Sent: Friday, November 5, 2021 4:22 PM
>>> > To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>; ceph-users@xxxxxxx
>>> > Subject: Re:  large bucket index in multisite environement
>>> > (how to deal with large omap objects warning)?
>>> >
>>> > Email received from the internet. If in doubt, don't click any link nor
>>> > open any attachment !
>>> > ________________________________
>>> > Cheers Istvan,
>>> >
>>> > how do you do this?
>>> >
>>> > Am Do., 4. Nov. 2021 um 19:45 Uhr schrieb Szabo, Istvan (Agoda) <
>>> > Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx>>:
>>> > This one you need to prepare, you beed to preshard the bucket which you
>>> > know that will hold more than millions of objects.
>>> >
>>> > I have a bucket where we store 1.2 billions of objects with 24xxx
>>> shard.
>>> > No omap issue.
>>> > Istvan Szabo
>>> > Senior Infrastructure Engineer
>>> > ---------------------------------------------------
>>> > Agoda Services Co., Ltd.
>>> > e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
>>> > ---------------------------------------------------
>>> >
>>> >
>>> >
>>> > --
>>> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
>>> im
>>> > groÃƒ¼en Saal.
>>> > _______________________________________________
>>> > ceph-users mailing list -- ceph-users@xxxxxxx
>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> >
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx