On Sun, Aug 13, 2023 at 11:34 PM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote: > Ahhhhh, from the original phrasing I thought that Quincy correlated with a sharp drop. It does, but the causation is in the other direction. We recently started experimenting with Proxmox Backup Server, which is really cool, but performs enough IO to basically lock out the VM being backed up, leading to IO timeouts, leading to user complaints. :-( We've always thought this was an unavoidable (but intermittent) problem specific to our workload until this happened. Once this happened, we then upgraded to Quincy because that is the version Proxmox currently recommends/supports and we were approaching it as a Proxmox problem. Then we reproduced it with rados bench within the Ceph cluster with no Proxmox involvement. So we no longer think that. TLDR we upgraded to Quincy because IO demands from new backup software made the problem worse, rather than Quincy made the problem worse. > > They have the latest firmware > > As per recent isdct/intelmas/sst? The web site? Yes. It's all "Solidigm" now, which has made information harder to find and firmware harder to get, but these drives aren't exactly getting regular updates at this point. > Just for grins, I might suggest -- once you have a fully healthy cluster -- destroying, secure-erasing, and redeploying a few OSDs at a time within a single failure domain. How old are the OSDs? The SSDs are probably 5-8 years old. The OSDs were rebuilt to bluestore around the luminous timeframe. (Nautilus, maybe. It was a while ago.) > I suspect at least some might be Filestore and thus would be redeployed with BlueStore. They are not; we manually converted them all to Bluestore > Newer SSD controllers / models are better than older models at housekeeping over time, so the secure-erase might freshen performance. I mean... I don't have much else to try, so I may give it a shot! My only hesitation is that there's not really any problem indicator I could check afterward. So I don't know how I would tell if it made a difference unless I did them all and then the problem went away. Which at the speed this thing rebuilds might well be a 3-month project. :-/ Thanks! _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx