Re: question about rgw delete speed

Adrian Nicolae <adrian.nicolae@xxxxxxxxxx> · Fri, 13 Nov 2020 09:24:35 +0200

Hi Brent,

Thanks for your input.

We will use Swift instead of S3. The deletes are mainly done by our 
customers using the sync app  (i.e they are syncing their folders with 
the storage accounts and every file change is translated to a delete in 
the cloud). We have a frontend cluster between the customers and the 
storage providing access via FTP/HTTP/Webdav and so on.

 The delete speed is important for us because we want to regain the 
'deleted' storage capacity as free as possible so we can keep the costs 
down. I'm pretty obsessed with that because I went through some 
nightmares in the past having our storage full for some time :).

For the network we will probably use 1x10Gbps or 2x10Gbps for every OSD 
server.

Actually I have a Ceph cluster in production already but it's been used 
as a secondary storage. We are moving here the 'cold data' from the 
primary storage (i.e the big files) . We are using it as a secondary 
storage because it was deployed with 'zombie' OSD servers (old DDR2 
servers recovered from other projects but with new SATA drives). It's 
working very  welll so far and it helped us to lower the costs with 
30-40% per TB but we cannot use it as primary.

On 11/13/2020 1:07 AM, Brent Kennedy wrote:
Ceph is definitely a good choice for storing millions of files.  It sounds like you plan to use this like s3, so my first question would be:  Are the deletes done for a specific reason?  ( e.g. the files are used for a process and discarded  )  If its an age thing, you can set the files to expire when putting them in, then ceph will automatically clear them.

The more spinners you have the more performance you will end up with.  Network 10Gb or higher?

Octopus is production stable and contains many performance enhancements.  Depending on the OS, you may not be able to upgrade from nautilus until they work out that process ( e.g. centos 7/8 ).

Delete speed is not that great but you would have to test it with your cluster to see how it performs for your use case.  If you have enough space present, is there a process that breaks if the files are not deleted?

Regards,
-Brent

Existing Clusters:
Test: Ocotpus 15.2.5 ( all virtual on nvme )
US Production(HDD): Nautilus 14.2.11 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways
UK Production(HDD): Nautilus 14.2.11 with 18 osd servers, 3 mons, 4 gateways, 2 iscsi gateways
US Production(SSD): Nautilus 14.2.11 with 6 osd servers, 3 mons, 4 gateways, 2 iscsi gateways
UK Production(SSD): Octopus 15.2.5 with 5 osd servers, 3 mons, 4 gateways

-----Original Message-----
From: Adrian Nicolae <adrian.nicolae@xxxxxxxxxx>
Sent: Wednesday, November 11, 2020 3:42 PM
To: ceph-users <ceph-users@xxxxxxx>
Subject:  question about rgw delete speed

Hey guys,

I'm in charge of a local cloud-storage service. Our primary object storage is a vendor-based one and I want to replace it in the near future with Ceph with the following setup :

- 6 OSD servers with 36 SATA 16TB drives each and 3 big NVME per server
(1 big NVME for every 12 drives so I can reserve 300GB NVME storage for every SATA drive), 3 MON, 2 RGW with Epyc 7402p and 128GB RAM. So in the end we'll have ~ 3PB of raw data and 216 SATA drives.

Currently we have ~ 100 millions of files on the primary storage with the following distribution :

- ~10% = very small files ( less than 1MB - thumbnails, text&office files and so on)

- ~60%= small files (between 1MB and 10MB)

-  20% = medium files ( between 10MB and 1GB)

- 10% = big files (over 1GB).

My main concern is the speed of delete operations. We have around 500k-600k delete ops every 24 hours so quite a lot. Our current storage is not deleting all the files fast enough (it's always 1 week-10 days
behind) , I guess is not only a software issue and probably the delete speed will get better if we add more drives (we now have 108).

What do you think about Ceph delete speed ? I read on other threads that it's not very fast . I wonder if this hw setup can handle our current delete load better than our current storage. On RGW servers I want to use Swift , not S3.

And another question :   can I start deploying in production directly the latest Ceph version (Octopus) or is it safer to start with Nautilus until Octopus will be more stable ?

Any input would be greatly appreciated !

Thanks,

Adrian.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx