On 04/05/2021 10:14, Sam Just wrote:
You could try increasing osd_recovery_max_active (setting it will
override the osd_recovery_max_active_hdd default of 3) and
osd_recovery_max_single_start (default 1) to recover more of the small
objects concurrently.
True, but that would increase the workload on the HDDs which can (and
probably will) reduce performance.
Small objects are always a problem because of them being small. This
seems to apply of any object store and/or filesystem.
I always say: Nothing is free in this world
Meaning, writing a lot of small files might be easier (cheaper) on the
developer's side, but it's more difficult (expensive) on the storage side.
Each Object in RADOS has a certain amount of overhead and that results
in slower backfills/recovery. Especially on HDDs you will notice this.
Yes, reducing the amount of objects by making them larger helps, but
that shifts the problem to the user of RGW.
Short: Larger objects will increase your recovery performance.
Wido
-Sam
On Sun, May 2, 2021 at 8:27 PM Prasad Krishnan
<prasad.krishnan@xxxxxxxxxxxx> wrote:
Dear Ceph Developers,
On our Ceph S3 Storage clusters we have found that the rate of recovery/backfill on the cluster with S3/RADOS object sizes between 10KB - 100KB takes much longer to recover compared to our other cluster whose S3 object sizes are usually a few tens of MB (with "rgw_obj_stripe_size" of 4MB, so RADOS objects would be 4MB or lesser).
We're exploring ways to improve the recovery speed by keeping the following factors constant (since tweaking them would lead to other issues):
Type of media - this would be HDD as moving all data to SSD would be prohibitively expensive
"osd_max_backfills" - We do not want to increase this as it leads to blocked requests and interferes with client I/O. We suspect that the disk RPS gets saturated if increased.
PG count - Increasing this would lead to more memory usage beyond what's available with the OSDs.
I came across the same question posted on this forum a few years back but seems to have no answers. Refer this and this.
Can the community help me understand what is theoretically causing this slowness? Is the overhead in recovering each RADOS object (grabbing a lock on the PG, txn overhead) so high that any increase in its number would decrease the recovery throughput?
Should I just tweak our workloads to not generate small sized S3/RADOS objects so that the MTTR would become better for our cluster?
Thanks,
Prasad Krishnan
-----------------------------------------------------------------------------------------
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the organization. Any information on shares, debentures or similar instruments, recommended product pricing, valuations and the like are for information purposes only. It is not meant to be an instruction or recommendation, as the case may be, to buy or to sell securities, products, services nor an offer to buy or sell securities, products or services unless specifically stated to be so on behalf of the Flipkart group. Employees of the Flipkart group of companies are expressly required not to make defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by email communications. Any such communication is contrary to organizational policy and outside the scope of the employment of the individual concerned. The organization will not accept any liability in respect of such communication, and the employee resp
onsible
will be personally liable for any damages or other liability arising.
Our organization accepts no liability for the content of this email, or for the consequences of any actions taken on the basis of the information provided, unless that information is subsequently confirmed in writing. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
-----------------------------------------------------------------------------------------
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx