So a few questions I have around this. What is the network you have for this cluster? Changing the bluestone_min_alloc_size would be the last thing I would even consider. In fact I wouldn’t be changing it as you are in untested territory. The challenge with making these sort of things perform is to generate lots of parallel streams so what ever is doing the uploading needs to be doing parallel multipart uploads. There is no mention of the uploading code that is being used. So with 7 Nodes each with 12 Disks and doing large files like this I would be expecting to see 50-70MB/s per useable HDD. By useable I mean if you are doing Replicas then you would divide the number of disks by the replica number or in your case with EC I would be diving the number of disks by the EC size and multiplying by the data part. So divide by 6 and multiply by 4. So allowing for EC overhead you in theory could get beyond 2.8GBytes/s That is the theoretical disk limit I would be looking to exceed. So now you have the question of do you have enough streams running in parallel? Have you tried a benchmarking tool such as minio warp to see what it can achieve. You haven’t mentioned the number of PG’s you have for each of the pools in question. You need to ensure that every pool that is being used has more PG’s that the number of disks. If that’s not the case then individual disks could be slowing things down. You also have the metadata pools used by RGW that ideally need to be on NVME. Because you are using EC then there is the buckets.non-ec pool which is used to manage the OMAPS for the multipart uploads this is usually down at 8 PG’s and that will be limiting things as well. Darren Soothill Want a meeting with me: https://calendar.app.google/MUdgrLEa7jSba3du9 Looking for help with your Ceph cluster? Contact us at https://croit.io/ croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io/ | YouTube: https://goo.gl/PGE1Bx > On 25 May 2024, at 14:56, Anthony D'Atri <aad@xxxxxxxxxxxxxx> wrote: > > > >> Hi Everyone, >> >> I'm putting together a HDD cluster with an ECC pool dedicated to the backup >> environment. Traffic via s3. Version 18.2, 7 OSD nodes, 12 * 12TB HDD + >> 1NVME each, > > QLC, man. QLC. That said, I hope you're going to use that single NVMe SSD for at least the index pool. Is this a chassis with universal slots, or is that NVMe device maybe M.2 or rear-cage? > >> Wondering if there is some general guidance for startup setup/tuning in >> regards to s3 object size. > > Small objects are the devil of any object storage system. > > >> Files are read from fast storage (SSD/NVME) and >> written to s3. Files sizes are 10MB-1TB, so it's not standard s3. traffic. > > Nothing nonstandard about that, though your 1TB objects presumably are going to be MPU. Having the .buckets.non-ec pool on HDD with objects that large might be really slow to assemble them, you might need to increase timeouts but I'm speculating. > > >> Backup for big files took hours to complete. > > Spinners gotta spin. They're a false economy. > >> My first shot would be to increase default bluestore_min_alloc_size_hdd, to >> reduce the number of stored objects, but I'm not sure if it's a >> good direccion? > > With that workload you *could* increase that to like 64KB, but I don't think it'd gain you much. > > >> Any other parameters worth checking to support such a >> traffic pattern? > > `ceph df` > `ceph osd dump | grep pool` > > So we can see what's going on HDD and what's on NVMe. > >> >> Thanks! >> >> -- >> Łukasz >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx