Re: Decrepit ceph cluster performance

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sun, 13 Aug 2023 22:43:29 -0400

> 
> Also, 1 CPU core/OSD is definitely undersized. I'm not sure how much
> you have -- but you want at least a couple per OSD for SSD, and even
> more for NVMe... especially when it comes to small block write
> workloads.

Think you meant s/SSD/SAS|SATA/

If the OP means physical core, granted that the CPUs are probably as old as the SSDs, but probably still have HT enabled, so 1c would be 2 threads, which for such small drives isn't so awful.

The OP implies that the cluster's performance *degraded* with the Quincy upgrade.I wonder if there was a kernel change at the same time.

>  have a cluster that is performing *very* poorly.
> 
> It has 12 physical servers and 72 400GB SATA Intel DC mixed-use SSD
> OSDs (S3700's & S3710's).

At first I thought you meant P3700 or another SKU since I'd never heard of these (and I used to work for the group who planned the product roadmap). Then I looked them up and see that these are likely 8-11 years old.

Do you have the latest available firmware installed on them?  Did you perform a secure-erase on each before deploying?  What manner of HBA is driving them?  The first generation NVMe AIC SKUs definitely had issues with initial firmware.

> All servers have bonded 10GB NICs for 20Gbps
> aggregate networking to a dedicated switch.

Public network only, no replication network?

> At least one CPU core

Core, or hyperthread?  What CPU SKU?  

> and
> 8GB RAM per OSD.  No SMART errors. All SSDs have media wearout
> indicators >90% remaining.
> 
> The cluster is Quincy 17.2.6 under cephadm.  (It was recently upgraded
> from Octopus.)  Its primary use is serving RBD VM block devices for
> Proxmox, but it is a standalone cluster.  (The RBD bench above was run
> from one of the mon servers, so Proxmox is not at all involved here.)

So no VMs on them, right?

> I also captured iostat -x 5 from several of the OSDs during the test,
> including one of the ones identified as having blocked IO.  It shows
> 3-15% utilized

iostat %util is not super useful on SSDs FWIW.

> 
> Is there anything to be done to further investigate or remediate this?
> This cluster is pretty old and I'm starting to think we should just
> scrap it as hopeless.  But that's a lot of hardware to write off if
> there's any possible way to fix this.

Has that hardware been in service that long?  

May sound like a copout, but I'd look hard at the networking.   Look for dropped / retransmitted packets, framing / CRC errors on the switch side.  Maybe use fping or gping, iperf3.  Check the interfaces to ensure they have the proper netmasks and default routes; I've seen systems misconfigured that sent intra-subnet traffic up to the router due to such misconfigurations.

> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx