Re: Pacific: access via S3 / Object gateway slow for small files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Den tis 24 aug. 2021 kl 09:12 skrev E Taka <0etaka0@xxxxxxxxx>:
> As a simple test I copied an Ubuntu /usr/share/doc (580 MB in 23'000 files):
>
> - rsync -a to a Cephfs took 2 min
> - s3cmd put --recursive took over 70 min
> Users reported that the S3 access is generally slow, not only with s3tools.

Single per-object accesses and writes on S3 are slower, since they
involve both client and server side checksumming, a lot of http(s)
stuff before the actual operations start and I don't think there is a
lot of connection reuse or pipelining being done so you are going to
make some 23k requests, each taking a non-zero time to complete.

> So my question is: How do we speed up access via S3?

Have everything for rgw on ceph except the data pool on ssd/nvme,
configure the clients, use a client (or several clients) that can
parallelize uploads to mitigate the time cost of setting up http for
transfers.

Try rclone as a client, it has good options for multi-stream uploads.

For s3cmd, you can run it twice with a s3-side delete in between and
ask it to cache the calculated md5 checksums if you want to compare
the actual write speed and not the client side checksum speeds.
(though clients are usually rather fast at that nowadays)

Also, s3cmd in my case goes faster if I have
send_chunk = 262144
recv_chunk = 262144
but this sort of assumes larger objects, which I guess from
580M/23'000 you don't really have.

If your use case really matches /usr/share/doc, then I would suggest
compressing the lot into one blob and storing that, this will make a
lot of things better for S3, both in terms of performance in transfers
but also in the amount of info the S3 side needs to retain about each
object (which in many cases just is useless overhead).

S3 is not a network filesystem, so acting like it is one will not be
favourable to anyone. It is more like ftp, whereas cephfs is more like
nfs or smb/cifs. They each have strengths and weaknesses and should be
used where they work best, and the cases where s3 or cephfs are best
do not overlap a lot.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux